Abstract (IGR: 23-3) | International Glaucoma Review #

Abstract #106248 Published in IGR 23-3

Visual Field Prediction: Evaluating the Clinical Relevance of Deep Learning Models

Eslami M; Kim JA; Zhang M; Boland MV; Wang M; Chang DS; Elze T
Ophthalmology science 2023; 3: 100222

PURPOSE: Two novel deep learning methods using a convolutional neural network (CNN) and a recurrent neural network (RNN) have recently been developed to forecast future visual fields (VFs). Although the original evaluations of these models focused on overall accuracy, it was not assessed whether they can accurately identify patients with progressive glaucomatous vision loss to aid clinicians in preventing further decline. We evaluated these 2 prediction models for potential biases in overestimating or underestimating VF changes over time. DESIGN: Retrospective observational cohort study. PARTICIPANTS: All available and reliable Swedish Interactive Thresholding Algorithm Standard 24-2 VFs from Massachusetts Eye and Ear Glaucoma Service collected between 1999 and 2020 were extracted. Because of the methods' respective needs, the CNN data set included 54 373 samples from 7472 patients, and the RNN data set included 24 430 samples from 1809 patients. METHODS: The CNN and RNN methods were reimplemented. A fivefold cross-validation procedure was performed on each model, and pointwise mean absolute error (PMAE) was used to measure prediction accuracy. Test data were stratified into categories based on the severity of VF progression to investigate the models' performances on predicting worsening cases. The models were additionally compared with a no-change model that uses the baseline VF (for the CNN) and the last-observed VF (for the RNN) for its prediction. MAIN OUTCOME MEASURES: PMAE in predictions. RESULTS: The overall PMAE 95% confidence intervals were 2.21 to 2.24 decibels (dB) for the CNN and 2.56 to 2.61 dB for the RNN, which were close to the original studies' reported values. However, both models exhibited large errors in identifying patients with worsening VFs and often failed to outperform the no-change model. Pointwise mean absolute error values were higher in patients with greater changes in mean sensitivity (for the CNN) and mean total deviation (for the RNN) between baseline and follow-up VFs. CONCLUSIONS: Although our evaluation confirms the low overall PMAEs reported in the original studies, our findings also reveal that both models severely underpredict worsening of VF loss. Because the accurate detection and projection of glaucomatous VF decline is crucial in ophthalmic clinical practice, we recommend that this consideration is explicitly taken into account when developing and evaluating future deep learning models.

Schepens Eye Research Institute, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts.

Full article

Classification:

15 Miscellaneous

Issue 23-3

Table of Contents Editor's Selection

PDF EPUB

Change Issue