AI has shown promise in the diagnosis and grading of prostate cancer. However studies so far have been siloed, “with limited proof for generalization across diverse multinational cohorts, representing one of the central barriers to implementation of AI algorithms in clinical practice,” the investigators wrote in Nature Medicine.
Wouter Bulten, from the Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, the Netherlands, and coauthors reported the outcomes of the international PANDA histopathology competition, in which 1,290 deep learning algorithm developers were challenged to come up with reproducible algorithms that could match the findings of human experts. Deep learning is a form of machine learning in which artificial neural networks “learn” from large datasets and apply that learning in a similar way to the human brain. At least one AI product for detecting prostate cancer – the Paige Prostate system – has already been approved for clinical use in the United States. The Food and Drug Administration authorized marketing it in September 2021, as an adjunct to – but not replacement for – pathologist review.
The developers of the new algorithms participating in the competition were given a set of 10,616 digitized prostate biopsies to learn from, then were tested against a panel of either one to six – depending on the country – experienced uropathologists on a set of 393 digitized slides. A selection of 15 teams were then invited to take part in a validation phase with an additional 1,616 slides.
Within the first 10 days of the competition, one algorithm already achieved greater than 0.90 agreement with the uropathologists; by day 33, the median performance of all the teams in the competition was greater than 0.85 agreement with the human experts.
Algorithms correctly detected tumors in 99.7% of cases
The algorithms selected for validation showed even higher levels of agreement – 0.931 on average (95% confidence interval, 0.918-0.944). These algorithms correctly detected tumors in 99.7% of cases (95% CI, 98.1%-99.7%), and correctly identified 92.9% of negative results (95% CI, 91.9%-96.7%).
When it came to classifying the prostate cancers based on Gleason grade, the algorithms showed significantly more agreement with uropathologists than did an international panel of 13 or 20 general pathologists.
“This higher sensitivity shows promise for reducing pathologist workload by automated identification and exclusion of most benign biopsies from review,” the authors wrote.
The study found that the AI algorithms missed 1%-1.9% of cancers, but the general pathologists missed 1.8%-7.3%. The algorithms demonstrated a sensitivity of 96.4%-98.2% and specificity of 75%-100% for tumors, whereas the pathologists showed a sensitivity of 91.9-96.5% and specificity of 92.3%-95%.
Benign cases were misclassified
The main error that the algorithms made was misclassifying benign cases as ISUP GG 1 cancer. The authors commented that this was likely caused by a shift in the distribution of cases between the training data given to the algorithms and the data set they were validated on.
They also noted that, in one validation set, the algorithms overgraded a “substantial proportion” of ISUP GG 3 cases as GG 4, whereas general pathologists tended to undergrade cases, particularly in the higher-grade cancers.
“These differences suggest that general pathologists supported by AI could reach higher agreements with uropathologists, potentially alleviating some of the rater variability associated with Gleason grading,” they wrote.
The authors also pointed out that the algorithms were validated on individual biopsies from each patient, whereas in the clinical context, a pathologist would likely have multiple biopsies from a single patient.
“Future studies can focus on patient-level evaluation of tissue samples, taking multiple cores and sections into account for the final diagnosis,” they wrote.
The study was supported by the Dutch Cancer Society, Netherlands Organization for Scientific Research, Google, Verily Life Sciences, Swedish Research Council, Swedish Cancer Society, Swedish eScience Research Center, EIT Health, Karolinska Institutet, Åke Wiberg Foundation, Prostatacancerförbundet, Academy of Finland, Cancer Foundation Finland, and ERAPerMed. The authors declared a range of grants and funding outside the study, including from Philips Digital Pathology Solutions. Several authors declared patents related to prostate cancer diagnoses, and 10 were employees of Google.