You can’t spell “ophthalmologist” without artificial intelligence (AI) — a fact that might have many eye specialists looking warily over their shoulders. But should they be concerned, or is it time to embrace the new technology?
Ophthalmologists, like most other medical specialists, might be looking warily over their shoulders as AI continues to evolve. But should they be concerned, or is it time to embrace the new technology?
Two recent studies have demonstrated the ability of AI to match ophthalmologists’ answers to patients’ questions about eye disease, and the technology is poised to assist ophthalmologists in managing patient workflow and overcome shortages in the ophthalmic workforce, attendees at the American Glaucoma Society meeting presented on March 2, 2024, in Huntington Beach, California, were told.
A study at the Icahn School of Medicine at Mount Sinai in New York City found chatbots matched the proficiency of fellowship-trained ophthalmologists in diagnostic accuracy and completeness in handling questions about eye disease and real patient cases. Another study found a similar result in handling 200 eye care questions from an online chat forum, Robert Chang, MD, co-author of the second study and a glaucoma specialist and associate professor of ophthalmology at Stanford University in Stanford, California, reported.
“Using prompt engineering of ChatGPT, replies to patient online forum questions are becoming so realistic that specialist physicians are having difficulty telling the difference between human- and machine-generated responses,” Dr. Chang said.
The study used questions patients submitted to an online medical forum that received responses from ophthalmologists, then presented those responses plus answers generated by ChatGPT to a panel of eight ophthalmologists and asked them to distinguish between the two. “The accuracy of judging whether you could tell if it was written by AI or a human was about 61%,” Dr. Chang reported. “So most of the time you could not tell the difference.”
The study reported that of 800 evaluations of chatbot-written answers, ophthalmologists rated 21% of them as human-written, while they marked 64.6% of human-written answers as AI-written. The study also found that the likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers: Less than 1% for both.
Dr. Chang also referenced a similar, more recent study from the Icahn School of Medicine in which 15 clinicians reviewed answers to patient questions by fellowship-trained glaucoma and retina specialists and those generated by ChatGPT-4, the chatbot model released by OpenAI in the spring of 2023. The study used a statistical tool to evaluate the combined question-case mean rank for accuracy, which was 506.2 for ChatGPT and 403.4 for glaucoma specialists based on 831 question cases. The mean rank for completeness was 528.3 and 398.7, respectively, based on 828 question cases (P < .001).
“The specialists themselves didn’t rate their answers as good as they rated the chatbot answers,” Dr. Chang said. “So it’s really showing that what we can come up with generative AI so human-like that it’s difficult for us to tell the difference on accuracy and completeness.”
‘Getting Close’
However, he noted that chatbots still have some kinks to work out with factual errors, difficulty referencing reliable sources, and hallucinations — fabricated material that may not be accurate. “We’re not quite there yet, but it’s getting close,” Dr. Chang added.
Dr. Chang noted that his clinic at Stanford is testing an AI platform to perform virtual scribing of patient encounters in real time. “That can save a lot of time on documentation,” he said. “I think this is a direction moving forward to increase our productivity because as we know, we all have workforce problems whether it’s the doctors or having enough technicians.”
Chatbots also are being tested for scheduling and generating letters. “Because of the unique needs of each ophthalmologist, AI agents that augment the existing workforce on specific administrative tasks will be the most likely early use case rather than autonomous disease screening or clinical decision support tools, which will take longer to validate prospectively in specific cohorts,” he said.
“Any AI today still has a long way to go before it can ingest and verify all the data for independent decision-making and be validated for fairness and generalizability,” Dr. Chang added. “It is much easier to have ‘low-level thinking’, repetitive tasks be taken care of by algorithms first.”
Dr. Chang “made a convincing case for embracing currently feasible applications of AI, highlighting the potential benefits of leveraging LLMs such as ChatGPT to enhance clinical productivity,” said Thasarat Vajaranant, MD, MHA, director of the glaucoma service and of data sourcing and strategy for the AI Ophthalmology Center at Illinois Eye and Ear Infirmary and the University of Illinois Chicago.
“With the growing demands of an aging population and workforce shortages in ophthalmology, AI-driven solutions offer promise in various areas, including telemedicine triage, organizing clinical notes, assisting in assessments and treatment planning, and virtual scribing,” Dr. Vajaranant said. “Rather than fearing this technology, we should approach its integration with caution.”
Dr. Chang disclosed relationships with Alcon, Genentech/Roche, Itnalight, Verana Health, Sight Sciences, Ocular Therapeutix, Glaukos, Carl Zeiss Meditec, and Apple. Dr. Vajaranant had no relevant disclosures.
A version of this article first appeared on Medscape.com.