Among the potential uses envisioned for artificial intelligence (AI) in healthcare is decreasing provider burden by using the technology to help respond to patients’ questions submitted through portals.
Easing the burden on providers of responding to each question is a target ripe for solutions as during the COVID pandemic, such messages increased 157% from prepandemic levels, say authors of a paper published online in JAMA Network Open. Each additional message added 2.3 minutes to time spent on the electronic health record (EHR) per day.
Researchers at Stanford Health Care, led by Patricia Garcia, MD, with the department of medicine, conducted a 5-week, prospective, single-group quality improvement study from July 10 through August 13, 2023, at Stanford to test an AI response system.
Large Language Model Used
All attending physicians, advanced practice providers, clinic nurses, and clinical pharmacists from the divisions of primary care and gastroenterology and hepatology were enrolled in a pilot program that offered the option to answer patients’ questions with drafts that were generated by a Health Insurance Portability and Accountability Act–compliant large language model integrated into EHRs. Drafts were then reviewed by the provider.
The study primarily tested whether providers (162 were included) would use the AI-generated drafts. Secondary outcomes included whether using such a system saved time or improved the clinician experience.
Participants received survey emails before and after the pilot period and answered questions on areas including task load, EHR burden, usability, work exhaustion, burnout, and satisfaction.
Researchers found that the overall average utilization rate per clinician was 20% but there were significant between-group differences. For example, in gastroenterology and hepatology, nurses used the AI tool the most at 29% and physicians/APPs had a 24% usage rate, whereas clinical pharmacists had the highest use rate for primary care at 44% compared with physician use at 15%.
Burden Improved, But Didn’t Save Time
AI did not appear to save time but did improve task load scores and work exhaustion scores. The report states that there was no change in reply action time, write time, or read time between the prepilot and pilot periods. However, there were significant reductions in the physician task load score derivative (mean [SD], 61.31 [17.23] pre survey vs 47.26 [17.11] post survey; paired difference, −13.87; 95% CI, −17.38 to −9.50; P < .001) and work exhaustion scores decreased by a third (mean [SD], 1.95 [0.79] pre survey vs 1.62 [0.68] post survey; paired difference, −0.33; 95% CI, −0.50 to −0.17; P < .001)
The authors wrote that improvements in task load and emotional exhaustion scores suggest that generated replies have the potential to lessen cognitive burden and burnout. Though the AI tool didn’t save time, editing responses may be less cognitively taxing than writing responses for providers, the authors suggest.
Quality of AI Responses
Comments about AI response message voice and/or tone were the most common and had the highest absolute number of negative comments (10 positive, 2 neutral, and 14 negative). The most negative comments were about length (too long or too short) of the draft message (1 positive, 2 neutral, and 8 negative).
Comments on accuracy of the draft response were fairly even — 4 positive and 5 negative — but there were no adverse safety signals, the authors report.
The providers had high expectations about use and quality of the tool that “were either met or exceeded at the end of the pilot,” Dr. Garcia and coauthors write. “Given the evidence that burnout is associated with turnover, reductions in clinical activity, and quality, even a modest improvement may have a substantial impact.”
One coauthor reported grants from Google, Omada Health, and PredictaMed outside the submitted work. Another coauthor reported having a patent for Well-being Index Instruments and Mayo Leadership Impact Index, with royalties paid from Mayo Clinic, and receiving honoraria for presenting grand rounds, keynote lectures, and advising health care organizations on clinician well-being. No other disclosures were reported.