Groups of physicians and trainees diagnose clinical cases with more accuracy than individuals, according to a study of solo and aggregate diagnoses collected through an online medical teaching platform.
“These findings suggest that using the concept of collective intelligence to pool many physicians’ diagnoses could be a scalable approach to improve diagnostic accuracy,” wrote lead author Michael L. Barnett, MD, of Harvard University in Boston and his coauthors, adding that “groups of all sizes outperformed individual subspecialists on cases in their own subspecialty.” The study was published online in JAMA Network Open.
This cross-sectional study examined 1,572 cases solved within the Human Diagnosis Project (Human Dx) system, an online platform for authoring and diagnosing teaching cases. The system presents real-life cases from clinical practices and asks respondents to generate ranked differential diagnoses. Cases are tagged for specialties based on both intended diagnoses and the top diagnoses chosen by respondents. All cases used in this study were authored between May 7, 2014, and October 5, 2016, and had 10 or more respondents.
Of the 2,069 attending physicians and fellows, residents, and medical students (users) who solved cases within the Human Dx system, 1,452 (70.2%) were trained in internal medicine, 1,228 (59.4%) were residents or fellows, 431 (20.8%) were attending physicians, and 410 (19.8%) were medical students. To create a collective differential, Dr. Barnett and his colleagues aggregated the responses of up to nine participants via a weighted combination of each clinician’s top three diagnoses, which they dubbed “collective intelligence.”
The diagnostic accuracy for groups of nine was 85.6% (95% confidence interval, 83.9%-87.4%), compared with individual users at 62.5% (95% CI, 60.1%-64.9%), a difference of 23% (95% CI, 14.9%-31.2%; P less than .001). Groups of five saw a 17.8% difference in accuracy versus an individual (95% CI, 14.0%-21.6%; P less than .001), compared with 12.5% for groups of two (95% CI, 9.3%-15.8%; P less than .001). Taken together, these seem to underline an association between larger groups and increased accuracy.
Individual specialists solved cases in their particular areas with a diagnostic accuracy of 66.3% (95% CI, 59.1%-73.5%), compared with nonmatched specialty accuracy of 63.9% (95% CI, 56.6%-71.2%). Groups, however, outperformed specialists across the board: 77.7% accuracy for a group of 2 (95% CI, 70.1%-84.6%; P less than .001) and 85.5% accuracy for a group of 9 (95% CI, 75.1%-95.9%; P less than .001).
The coauthors shared the limitations of their study, including the possibility that the users who contributed these cases to Human Dx may not be representative of the medical community as a whole. They also noted that, while their 431 attending physicians constituted the “largest number ... to date in a study of collective intelligence,” trainees still made up almost 80% of users. In addition, they acknowledged that Human Dx was not designed to generate collective diagnoses nor assess collective intelligence; another platform created with that ability in mind may have returned different results. Finally, they were unable to assess how exactly greater accuracy would have been linked to changes in treatment, calling it “an important question for future work.”
The authors disclosed several conflicts of interest. One doctor reported receiving personal fees from Greylock McKinnon Associates; another reported receiving personal fees from the Human Diagnosis Project and serving as their nonprofit director during the study. A third doctor reported consulting for a company that makes patient-safety monitoring systems and receiving compensation from a not-for-profit incubator, along with having equity in three medical data and software companies.
SOURCE: Barnett ML et al. JAMA Netw Open. 2019 Mar 1. doi: 10.1001/jamanetworkopen.2019.0096.