Latest News

Systemic Bias in AI Models May Undermine Diagnostic Accuracy


 

FROM JAMA

Systematically biased artificial intelligence (AI) models did not improve clinicians’ accuracy in diagnosing hospitalized patients, based on data from more than 450 clinicians.

“Artificial Intelligence (AI) could support clinicians in their diagnostic decisions of hospitalized patients but could also be biased and cause potential harm,” said Sarah Jabbour, MSE, a PhD candidate in computer science and engineering at the University of Michigan, Ann Arbor, in an interview.

“Regulatory guidance has suggested that the use of AI explanations could mitigate these harms, but the effectiveness of using AI explanations has not been established,” she said.

To examine whether AI explanations can be effective in mitigating the potential harms of systemic bias in AI models, Ms. Jabbour and colleagues conducted a randomized clinical vignette survey study. The survey was administered between April 2022 and January 2023 across 13 states, and the study population included hospitalist physicians, nurse practitioners, and physician assistants. The results were published in JAMA.

Participants were randomized to AI predictions with AI explanations (226 clinicians) or without AI explanations (231 clinicians).

The primary outcome was diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease, defined as the number of correct diagnoses over the total number of assessments, the researchers wrote.

The clinicians viewed nine clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians viewed two vignettes with no AI model input to establish baseline diagnostic accuracy. They made three assessments in each vignette, one for each diagnosis. The order of the vignettes was two without AI predictions (to establish baseline diagnostic accuracy), six with AI predictions, and one with a clinical consultation by a hypothetical colleague. The vignettes included standard and systematically biased AI models.

The baseline diagnostic accuracy was 73% for the diagnoses of pneumonia, heart failure, and chronic obstructive pulmonary disease. Clinicians’ accuracy increased by 2.9% when they viewed a standard diagnostic AI model without explanations and by 4.4% when they viewed models with AI explanations.

However, clinicians’ accuracy decreased by 11.3% after viewing systematically biased AI model predictions without explanations compared with baseline, and biased AI model predictions with explanations decreased accuracy by 9.1%.

The decrease in accuracy with systematically biased AI predictions without explanations was mainly attributable to a decrease in the participants’ diagnostic specificity, the researchers noted, but the addition of explanations did little to improve it, the researchers said.

Potentially Useful but Still Imperfect

The findings were limited by several factors including the use of a web-based survey, which differs from surveys in a clinical setting, the researchers wrote. Other limitations included the younger than average study population, and the focus on the clinicians making treatment decisions, vs other clinicians who might have a better understanding of the AI explanations.

“In our study, explanations were presented in a way that were considered to be obvious, where the AI model was completely focused on areas of the chest X-rays unrelated to the clinical condition,” Ms. Jabbour told this news organization. “We hypothesized that if presented with such explanations, the participants in our study would notice that the model was behaving incorrectly and not rely on its predictions. This was surprisingly not the case, and the explanations when presented alongside biased AI predictions had seemingly no effect in mitigating clinicians’ overreliance on biased AI,” she said.

“AI is being developed at an extraordinary rate, and our study shows that it has the potential to improve clinical decision-making. At the same time, it could harm clinical decision-making when biased,” Ms. Jabbour said. “We must be thoughtful about how to carefully integrate AI into clinical workflows, with the goal of improving clinical care while not introducing systematic errors or harming patients,” she added.

Looking ahead, “There are several potential research areas that could be explored,” said Ms. Jabbour. “Researchers should focus on careful validation of AI models to identify biased model behavior prior to deployment. AI researchers should also continue including and communicating with clinicians during the development of AI tools to better understand clinicians’ needs and how they interact with AI,” she said. “This is not an exhaustive list of research directions, and it will take much discussion between experts across disciplines such as AI, human computer interaction, and medicine to ultimately deploy AI safely into clinical care.”

Pages

Recommended Reading

Navigating chronic cough in primary care
MDedge Cardiology
Supplemental oxygen fails to improve echocardiographic measures in PE patients
MDedge Cardiology
Marijuana use dramatically increases risk of heart problems, stroke
MDedge Cardiology
Reimagining rehabilitation: In-home physical therapy gets a boost
MDedge Cardiology
Smartphone app detects voice quality changes indicating worsening heart failure
MDedge Cardiology
WHO: Smoking cessation reduces risk of type 2 diabetes up to 40%
MDedge Cardiology
Smoking alters salivary microbiota in potential path to disease risk
MDedge Cardiology
Pulmonary arterial hypertension: Promising results for investigational agents and catheter-based denervation
MDedge Cardiology
Sotatercept Endorsed for PAH by ICER
MDedge Cardiology
Catheter-directed strategy improves pulmonary artery occlusion
MDedge Cardiology