advertisement
OBJECTIVE: Training data fuel and shape the development of artificial intelligence (AI) models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In health care, training data are difficult to curate, triggering growing concerns that the current lack of access to health care by under-privileged social groups will translate into future bias in health care AIs. In this report, we developed an autoencoder to grow and enhance inherently scarce datasets to alleviate our dependence on big data. DESIGN: Computational study with open-source data. SUBJECTS: The data were obtained from 6 open-source datasets comprising patients aged 40-80 years in Singapore, China, India, and Spain. METHODS: The reported framework generates synthetic images based on real-world patient imaging data. As a test case, we used autoencoder to expand publicly available training sets of optic disc photos, and evaluated the ability of the resultant datasets to train AI models in the detection of glaucomatous optic neuropathy. MAIN OUTCOME MEASURES: Area under the receiver operating characteristic curve (AUC) were used to evaluate the performance of the glaucoma detector. A higher AUC indicates better detection performance. RESULTS: Results show that enhancing datasets with synthetic images generated by autoencoder led to superior training sets that improved the performance of AI models. CONCLUSIONS: Our findings here help address the increasingly untenable data volume and quality requirements for AI model development and have implications beyond health care, toward empowering AI adoption for all similarly data-challenged fields. FINANCIAL DISCLOSURES: The authors have no proprietary or commercial interest in any materials discussed in this article.
Department of Ophthalmology, University of California, San Francisco, San Francisco, California.
Full article