advertisement
In this retrospective observational monocentric study, Tao and colleagues compared four survival-based artificial intelligence (AI) models using electronic health records to predict glaucoma progression to surgery in 4512 patients: linear regression-based (Cox regression), tree-based (random survival forest and gradient-boosting survival), and deep learning-based (DeepSurv) models. There were no statistically significant differences among these models, with a slightly better performance of DeepSurv and tree-based models.
The authors are the first to apply AI to survival models for glaucoma patients, distinguishing their approach from previous studies which used classifiers for binary outcomes, overlooking the longitudinal nature of the outcomes.1 They emphasize that Cox regression, widely employed for survival analysis, makes assumptions (independence of survival times, absence of correlation between features, and constant hazard ratio) that are often not satisfied for large retrospective datasets. The authors suggest that DeepSurv and tree-based models may overcome these assumptions. Despite being theoretically true, the absence of statistically and, more importantly, clinically significant differences indicates that more complicated models may not provide any meaningful benefit to simpler models, such as Cox regression, even when their assumptions are not strictly met. Moreover, there are many variations of standard statistical models for survival analyses that can overcome many of the assumptions of naïve Cox models. These may be simpler, more interpretable and, therefore, preferrable to complex AI methods. This is not to say that AI might not find more impactful applications in this space. For example, AI could deal with the complexity of unstructured data (images and clinical notes), which would be difficult to integrate in a standard regression model.
The absence of statistically and, more importantly, clinically significant differences indicates that more complicated models may not provide any meaningful benefit to simpler models, such as Cox regression, even when their assumptions are not strictly metA few things should be borne in mind when evaluating results of AI models. Models trained on data from a single center, such as in this work, often lack generalizability and might perform far worse when applied to different clinical cohorts.2,3 Moreover, DeepSurv, like other AI models, is computationally demanding and requires careful training, which can be very resource and time intensive. Finally, researchers can often achieve only a limited understanding of the underlying decision process of these algorithms, which poses challenges in the interpretation of the results.4,5
In conclusion, the authors proposed a set of promising AI-survival models for predicting glaucoma progression. However, given the results of their investigation, the use of these complex models does not seem entirely justified.