Abstract (IGR: 24-4) | International Glaucoma Review #

Abstract #118522 Published in IGR 24-4

Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model

Tailor PD; Xu TT; Fortes BH; Iezzi R; Olsen TW; Starr MR; Starr MR; Bakri SJ; Scruggs BA; Barkmeier AJ; Patel SV; Baratz KH; Bernhisel AA; Wagner LH; Tooley AA; Roddy GW; Sit AJ; Wu KY; Bothun ED; Mansukhani SA; Mohney BG; Chen JJ; Brodsky MC; Tajfirouz DA; Chodnicki KD; Smith WM; Dalvin LA
Mayo Clinic proceedings. Digital health 2024; 2: 119-128

OBJECTIVE: To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions. PATIENTS AND METHODS: Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty. RESULTS: For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts. CONCLUSION: This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.

Department of Ophthalmology (P.D.T., T.T.X., B.H.F., R.I., T.W.O., M.R.S., S.J.B., B.A.S., A.J.B., S.J.B., B.A.S., A.J.B., S.V.P., K.H.B., A.A.B., L.H.W., A.A.T., G.W.R., A.J.S., K.Y.W., E.D.B., S.A.M., B.G.M., J.J.C., M.C.B., D.A.T., K.D.C., W.M.S., L.A.D.) and Department of Neurology (J.J.C.), Mayo Clinic, Rochester, MN; and Department of Ophthalmology, Duke University, Durham, NC (K.Y.W.).

Full article

Classification:

15 Miscellaneous

Issue 24-4

Table of Contents Editor's Selection

PDF EPUB

Change Issue