Latest News

Systemic Bias in AI Models May Undermine Diagnostic Accuracy


 

FROM JAMA

Systematically biased artificial intelligence (AI) models did not improve clinicians’ accuracy in diagnosing hospitalized patients, based on data from more than 450 clinicians.

“Artificial Intelligence (AI) could support clinicians in their diagnostic decisions of hospitalized patients but could also be biased and cause potential harm,” said Sarah Jabbour, MSE, a PhD candidate in computer science and engineering at the University of Michigan, Ann Arbor, in an interview.

“Regulatory guidance has suggested that the use of AI explanations could mitigate these harms, but the effectiveness of using AI explanations has not been established,” she said.

To examine whether AI explanations can be effective in mitigating the potential harms of systemic bias in AI models, Ms. Jabbour and colleagues conducted a randomized clinical vignette survey study. The survey was administered between April 2022 and January 2023 across 13 states, and the study population included hospitalist physicians, nurse practitioners, and physician assistants. The results were published in JAMA.

Participants were randomized to AI predictions with AI explanations (226 clinicians) or without AI explanations (231 clinicians).

The primary outcome was diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease, defined as the number of correct diagnoses over the total number of assessments, the researchers wrote.

The clinicians viewed nine clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians viewed two vignettes with no AI model input to establish baseline diagnostic accuracy. They made three assessments in each vignette, one for each diagnosis. The order of the vignettes was two without AI predictions (to establish baseline diagnostic accuracy), six with AI predictions, and one with a clinical consultation by a hypothetical colleague. The vignettes included standard and systematically biased AI models.

The baseline diagnostic accuracy was 73% for the diagnoses of pneumonia, heart failure, and chronic obstructive pulmonary disease. Clinicians’ accuracy increased by 2.9% when they viewed a standard diagnostic AI model without explanations and by 4.4% when they viewed models with AI explanations.

However, clinicians’ accuracy decreased by 11.3% after viewing systematically biased AI model predictions without explanations compared with baseline, and biased AI model predictions with explanations decreased accuracy by 9.1%.

The decrease in accuracy with systematically biased AI predictions without explanations was mainly attributable to a decrease in the participants’ diagnostic specificity, the researchers noted, but the addition of explanations did little to improve it, the researchers said.

Potentially Useful but Still Imperfect

The findings were limited by several factors including the use of a web-based survey, which differs from surveys in a clinical setting, the researchers wrote. Other limitations included the younger than average study population, and the focus on the clinicians making treatment decisions, vs other clinicians who might have a better understanding of the AI explanations.

“In our study, explanations were presented in a way that were considered to be obvious, where the AI model was completely focused on areas of the chest X-rays unrelated to the clinical condition,” Ms. Jabbour told this news organization. “We hypothesized that if presented with such explanations, the participants in our study would notice that the model was behaving incorrectly and not rely on its predictions. This was surprisingly not the case, and the explanations when presented alongside biased AI predictions had seemingly no effect in mitigating clinicians’ overreliance on biased AI,” she said.

“AI is being developed at an extraordinary rate, and our study shows that it has the potential to improve clinical decision-making. At the same time, it could harm clinical decision-making when biased,” Ms. Jabbour said. “We must be thoughtful about how to carefully integrate AI into clinical workflows, with the goal of improving clinical care while not introducing systematic errors or harming patients,” she added.

Looking ahead, “There are several potential research areas that could be explored,” said Ms. Jabbour. “Researchers should focus on careful validation of AI models to identify biased model behavior prior to deployment. AI researchers should also continue including and communicating with clinicians during the development of AI tools to better understand clinicians’ needs and how they interact with AI,” she said. “This is not an exhaustive list of research directions, and it will take much discussion between experts across disciplines such as AI, human computer interaction, and medicine to ultimately deploy AI safely into clinical care.”

Pages

Recommended Reading

FREEDOM COVID: Full-dose anticoagulation cut mortality but missed primary endpoint
MDedge Emergency Medicine
A surfing PA leads an intense beach rescue
MDedge Emergency Medicine
Factors linked with increased VTE risk in COVID outpatients
MDedge Emergency Medicine
COVID can mimic prostate cancer symptoms
MDedge Emergency Medicine
Heat waves plus air pollution tied to doubling of fatal MI
MDedge Emergency Medicine
Healthy babies can still get very sick from RSV
MDedge Emergency Medicine
Paxlovid and Lagevrio benefit COVID outpatients in Omicron era
MDedge Emergency Medicine
Paxlovid tied to benefits in high-risk patients with COVID
MDedge Emergency Medicine
mRNA vaccine cuts COVID-related Guillain-Barré risk
MDedge Emergency Medicine
Revisiting the role of hydrocortisone, fludrocortisone in septic shock
MDedge Emergency Medicine