gastroesophageal reflux disease (GERD), investigators have found.
managingThe researchers say the tool’s conversational format could improve clinical efficiency and reduce the volume of patient messages and calls, potentially diminishing clinician burnout.
However, inconsistencies and content errors observed require a certain level of clinical oversight, caution the researchers, led by Jacqueline Henson, MD, with the division of gastroenterology, Duke University, Durham, N.C.
The study was published online in the American Journal of Gastroenterology.
Putting ChatGPT to the GERD test
Affecting nearly 30% of U.S. adults, GERD is a common and increasingly complex condition to manage. AI technologies like ChatGPT (Open AI/Microsoft) have demonstrated an increasing role in medicine, although the ability of ChatGPT to provide guidance for GERD management is uncertain.
Dr. Henson and colleagues assessed ChatGPT’s ability to provide accurate and specific responses to questions regarding GERD care.
They generated 23 GERD management prompts based on published clinical guidelines and expert consensus recommendations. Five questions were about diagnosis, eleven on treatment, and seven on both diagnosis and treatment.
Each prompt was submitted to ChatGPT 3.5 (version 3/14/2023) three times on separate occasions without feedback to assess the consistency of the answer. Responses were rated by three board-certified gastroenterologists for appropriateness and specificity.
ChatGPT returned appropriate responses to 63 of 69 (91.3%) queries, with 29% considered completely appropriate and 62.3% mostly appropriate.
However, responses to the same prompt were often inconsistent, with 16 of 23 (70%) prompts yielding varying appropriateness, including three (13%) with both inappropriate and appropriate responses.
Prompts regarding treatment received the highest proportion of completely appropriate responses (39.4%), while prompts for diagnosis and management had the highest proportion of mostly inappropriate responses (14.3%).
For example, the chatbot failed to recommend consideration of Roux-en-Y gastric bypass for ongoing GERD symptoms with pathologic acid exposure in the setting of obesity, and some potential risks associated with proton pump inhibitor therapy were stated as fact.
However, the majority (78.3%) of responses contained at least some specific guidance, especially for prompts assessing diagnosis (93.3%). In all responses, ChatGPT suggested contacting a health care professional for further advice.
Eight patients from a range of educational backgrounds who provided feedback on the responses generally felt that the ChatGPT responses were both understandable and useful.
Overall, ChatGPT “provided largely appropriate and at least some specific guidance for GERD management, highlighting the potential for this technology to serve as a source of information for patients, as well as an aid for clinicians,” Dr. Henson and colleagues write.
However, “the presence of inappropriate responses with inconsistencies to the same prompt largely preclude its application within health care in its present state, at least for GERD,” they add.
The study had no commercial funding. Dr. Henson has served as a consultant for Medtronic.
A version of this article first appeared on Medscape.com.