Latest News

AI Surpasses Harvard Docs on Clinical Reasoning Test


 

TOPLINE:

A study comparing the clinical reasoning of an artificial intelligence (AI) model with that of physicians found the AI outperformed residents and attending physicians in simulated cases. The AI had more instances of incorrect reasoning than the doctors did but scored better overall.

METHODOLOGY:

  • The study involved 39 physicians from two academic medical centers in Boston and the generative AI model GPT-4.
  • Participants were presented with 20 simulated clinical cases involving common problems such as pharyngitis, headache, abdominal pain, cough, and chest pain. Each case included sections describing the triage presentation, review of systems, physical examination, and diagnostic testing.
  • The primary outcome was the Revised-IDEA (R-IDEA) score, a 10-point scale evaluating clinical reasoning documentation across four domains: Interpretive summary, differential diagnosis, explanation of the lead diagnosis, and alternative diagnoses.

TAKEAWAY:

  • AI achieved a median R-IDEA score of 10, higher than attending physicians (median score, 9) and residents (8).
  • The chatbot had a significantly higher estimated probability of achieving a high R-IDEA score of 8-10 (0.99) compared with attendings (0.76) and residents (0.56).
  • AI provided more responses that contained instances of incorrect clinical reasoning (13.8%) than residents (2.8%) and attending physicians (12.5%). It performed similarly to physicians in diagnostic accuracy and inclusion of cannot-miss diagnoses.

IN PRACTICE:

“Future research should assess clinical reasoning of the LLM-physician interaction, as LLMs will more likely augment, not replace, the human reasoning process,” the authors of the study wrote.

SOURCE:

Adam Rodman, MD, MPH, with Beth Israel Deaconess Medical Center, Boston, was the corresponding author on the paper. The research was published online in JAMA Internal Medicine.

LIMITATIONS:

Simulated clinical cases may not replicate performance in real-world scenarios. Further training could enhance the performance of the AI, so the study may underestimate its capabilities, the researchers noted.

DISCLOSURES:

The study was supported by the Harvard Clinical and Translational Science Center and Harvard University. Authors disclosed financial ties to publishing companies and Solera Health. Dr. Rodman received funding from the Gordon and Betty Moore Foundation.

This article was created using several editorial tools, including AI, as part of the process. Human editors reviewed this content before publication. A version of this article appeared on Medscape.com.

Recommended Reading

Medicine or Politics? Doctors Defend Their Social Activism
MDedge Neurology
EHR Copy and Paste Can Get Physicians Into Trouble
MDedge Neurology
Oncologists Voice Ethical Concerns Over AI in Cancer Care
MDedge Neurology
No Routine Cancer Screening Option? New MCED Tests May Help
MDedge Neurology
Working From Home: Doctors’ Options Are Not Limited to Classic Telemedicine
MDedge Neurology
Association Calls For Increased Oversight in Response to Reports of Possibly Counterfeit Botulinum Toxin
MDedge Neurology
‘Difficult Patient’: Stigmatizing Words and Medical Error
MDedge Neurology
Physicians Own Less Than Half of US Practices; Federal Agencies Want Outside Input
MDedge Neurology
Most Targeted Cancer Drugs Lack Substantial Clinical Benefit
MDedge Neurology
Burnout
MDedge Neurology