Abstract
Purpose
To create and validate code-free automated deep learning models (AutoML) for diabetic retinopathy (DR) classification from handheld retinal images.
Design
Prospective development and validation of AutoML models for DR image classification.
Participants
A total of 17 829 deidentified retinal images from 3566 eyes with diabetes, acquired using handheld retinal cameras in a community-based DR screening program.
Methods
AutoML models were generated based on previously acquired 5-field (macula-centered, disc-centered, superior, inferior, and temporal macula) handheld retinal images. Each individual image was labeled using the International DR and diabetic macular edema (DME) Classification Scale by 4 certified graders at a centralized reading center under oversight by a senior retina specialist. Images for model development were split 8-1-1 for training, optimization, and testing to detect referable DR ([refDR], defined as moderate nonproliferative DR or worse or any level of DME). Internal validation was performed using a published image set from the same patient population (N = 450 images from 225 eyes). External validation was performed using a publicly available retinal imaging data set from the Asia Pacific Tele-Ophthalmology Society (N = 3662 images).
Main Outcome Measures
Area under the precision-recall curve (AUPRC), sensitivity (SN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 scores.
Results
Referable DR was present in 17.3%, 39.1%, and 48.0% of the training set, internal validation, and external validation sets, respectively. The model’s AUPRC was 0.995 with a precision and recall of 97% using a score threshold of 0.5. Internal validation showed that SN, SP, PPV, NPV, accuracy, and F1 scores were 0.96 (95% confidence interval [CI], 0.884–0.99), 0.98 (95% CI, 0.937–0.995), 0.96 (95% CI, 0.884–0.99), 0.98 (95% CI, 0.937–0.995), 0.97, and 0.96, respectively. External validation showed that SN, SP, PPV, NPV, accuracy, and F1 scores were 0.94 (95% CI, 0.929–0.951), 0.97 (95% CI, 0.957–0.974), 0.96 (95% CI, 0.952–0.971), 0.95 (95% CI, 0.935–0.956), 0.97, and 0.96, respectively.
Conclusions
This study demonstrates the accuracy and feasibility of code-free AutoML models for identifying refDR developed using handheld retinal imaging in a community-based screening program. Potentially, the use of AutoML may increase access to machine learning models that may be adapted for specific programs that are guided by the clinical need to rapidly address disparities in health care delivery.
To create and validate code-free automated deep learning models (AutoML) for diabetic retinopathy (DR) classification from handheld retinal images.
Design
Prospective development and validation of AutoML models for DR image classification.
Participants
A total of 17 829 deidentified retinal images from 3566 eyes with diabetes, acquired using handheld retinal cameras in a community-based DR screening program.
Methods
AutoML models were generated based on previously acquired 5-field (macula-centered, disc-centered, superior, inferior, and temporal macula) handheld retinal images. Each individual image was labeled using the International DR and diabetic macular edema (DME) Classification Scale by 4 certified graders at a centralized reading center under oversight by a senior retina specialist. Images for model development were split 8-1-1 for training, optimization, and testing to detect referable DR ([refDR], defined as moderate nonproliferative DR or worse or any level of DME). Internal validation was performed using a published image set from the same patient population (N = 450 images from 225 eyes). External validation was performed using a publicly available retinal imaging data set from the Asia Pacific Tele-Ophthalmology Society (N = 3662 images).
Main Outcome Measures
Area under the precision-recall curve (AUPRC), sensitivity (SN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 scores.
Results
Referable DR was present in 17.3%, 39.1%, and 48.0% of the training set, internal validation, and external validation sets, respectively. The model’s AUPRC was 0.995 with a precision and recall of 97% using a score threshold of 0.5. Internal validation showed that SN, SP, PPV, NPV, accuracy, and F1 scores were 0.96 (95% confidence interval [CI], 0.884–0.99), 0.98 (95% CI, 0.937–0.995), 0.96 (95% CI, 0.884–0.99), 0.98 (95% CI, 0.937–0.995), 0.97, and 0.96, respectively. External validation showed that SN, SP, PPV, NPV, accuracy, and F1 scores were 0.94 (95% CI, 0.929–0.951), 0.97 (95% CI, 0.957–0.974), 0.96 (95% CI, 0.952–0.971), 0.95 (95% CI, 0.935–0.956), 0.97, and 0.96, respectively.
Conclusions
This study demonstrates the accuracy and feasibility of code-free AutoML models for identifying refDR developed using handheld retinal imaging in a community-based screening program. Potentially, the use of AutoML may increase access to machine learning models that may be adapted for specific programs that are guided by the clinical need to rapidly address disparities in health care delivery.
Original language | English |
---|---|
Journal | Opthalmology Retina |
Early online date | 15 Mar 2023 |
DOIs | |
Publication status | Early online date - 15 Mar 2023 |
Keywords
- Automated Machine Learning
- Diabetic Retinopathy
- Screening
- Retinal Imaging
- Artificial Intelligence
- Handheld devices