Latest News

AI Surpasses Harvard Docs on Clinical Reasoning Test


 

TOPLINE:

A study comparing the clinical reasoning of an artificial intelligence (AI) model with that of physicians found the AI outperformed residents and attending physicians in simulated cases. The AI had more instances of incorrect reasoning than the doctors did but scored better overall.

METHODOLOGY:

  • The study involved 39 physicians from two academic medical centers in Boston and the generative AI model GPT-4.
  • Participants were presented with 20 simulated clinical cases involving common problems such as pharyngitis, headache, abdominal pain, cough, and chest pain. Each case included sections describing the triage presentation, review of systems, physical examination, and diagnostic testing.
  • The primary outcome was the Revised-IDEA (R-IDEA) score, a 10-point scale evaluating clinical reasoning documentation across four domains: Interpretive summary, differential diagnosis, explanation of the lead diagnosis, and alternative diagnoses.

TAKEAWAY:

  • AI achieved a median R-IDEA score of 10, higher than attending physicians (median score, 9) and residents (8).
  • The chatbot had a significantly higher estimated probability of achieving a high R-IDEA score of 8-10 (0.99) compared with attendings (0.76) and residents (0.56).
  • AI provided more responses that contained instances of incorrect clinical reasoning (13.8%) than residents (2.8%) and attending physicians (12.5%). It performed similarly to physicians in diagnostic accuracy and inclusion of cannot-miss diagnoses.

IN PRACTICE:

“Future research should assess clinical reasoning of the LLM-physician interaction, as LLMs will more likely augment, not replace, the human reasoning process,” the authors of the study wrote.

SOURCE:

Adam Rodman, MD, MPH, with Beth Israel Deaconess Medical Center, Boston, was the corresponding author on the paper. The research was published online in JAMA Internal Medicine.

LIMITATIONS:

Simulated clinical cases may not replicate performance in real-world scenarios. Further training could enhance the performance of the AI, so the study may underestimate its capabilities, the researchers noted.

DISCLOSURES:

The study was supported by the Harvard Clinical and Translational Science Center and Harvard University. Authors disclosed financial ties to publishing companies and Solera Health. Dr. Rodman received funding from the Gordon and Betty Moore Foundation.

This article was created using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication. A version of this article appeared on Medscape.com.

Recommended Reading

Physicians Received $12 Billion from Drug, Device Makers in Less Than 10 Years
MDedge Cardiology
Hospitals Cash In on a Private Equity-Backed Trend: Concierge Physician Care
MDedge Cardiology
Why Do So Many Doctors Embrace Superstitions and Rituals?
MDedge Cardiology
Medicine or Politics? Doctors Defend Their Social Activism
MDedge Cardiology
EHR Copy and Paste Can Get Physicians Into Trouble
MDedge Cardiology
Are You Ready for AI to Be a Better Doctor Than You?
MDedge Cardiology
Working From Home: Doctors’ Options Are Not Limited to Classic Telemedicine
MDedge Cardiology
‘Difficult Patient’: Stigmatizing Words and Medical Error
MDedge Cardiology
Physicians Own Less Than Half of US Practices; Federal Agencies Want Outside Input
MDedge Cardiology
Burnout
MDedge Cardiology