The incidence of skin cancer continues to increase, and it is by far the most common malignancy in the United States. Based on the sheer incidence and prevalence of skin cancer, early detection and treatment are critical. Looking at melanoma alone, the 5-year survival rate is greater than 99% when detected early but falls to 71% when the disease reaches the lymph nodes and 32% with metastasis to distant organs.1 Furthermore, a 2018 study found stage I melanoma patients who were treated 4 months after biopsy had a 41% increased risk of death compared with those treated within the first month.2 However, many patients are not seen by a dermatologist first for examination of suspicious skin lesions and instead are referred by a general practitioner or primary care mid-level provider. Therefore, many patients experience a longer time to diagnosis or treatment, which directly correlates with survival rate.
Dermoscopy is a noninvasive diagnostic tool for skin lesions, including melanoma. Using a handheld dermoscope (or dermatoscope), a transilluminating light source magnifies skin lesions and allows for the visualization of subsurface skin structures within the epidermis, dermoepidermal junction, and papillary dermis.3 Dermoscopy has been shown to improve a dermatologist’s accuracy in diagnosing malignant melanoma vs clinical evaluation with the unaided eye.4,5 More recently, dermoscopy has been digitized, allowing for the collection and documentation of case photographs. Dermoscopy also has expanded past the scope of dermatologists and has become increasingly useful in primary care.6 Among family physicians, dermoscopy also has been shown to have a higher sensitivity for melanoma detection compared to gross examination.7 Therefore, both the increased diagnostic performance of malignant melanoma using a dermoscope and the expanded use of dermoscopy in medical care validate the evaluation of an artificial intelligence (AI) algorithm in diagnosing malignant melanoma using dermoscopic images.
Triage (Triage Technologies Inc) is an AI application that uses a web interface and combines a pretrained convolutional neural network (CNN) with a reinforcement learning agent as a question-answering model. The CNN algorithm can classify 133 different skin diseases, 7 of which it is able to classify using dermoscopic images. This study sought to evaluate the performance of Triage’s dermoscopic classifier in identifying lesions as benign or malignant to determine whether AI could assist in the triage of skin cancer cases to shorten time to diagnosis.
Materials and Methods
The MClass-D test set from the International Skin Imaging Collaboration was assessed by both AI and practicing medical providers. The set was composed of 80 benign nevi and 20 biopsy-verified malignant melanomas. Board-certified US dermatologists (n=23), family physicians (n=7), and primary care mid-level providers (n=12)(ie, nurse practitioners, physician assistants) were asked to label the images as benign or malignant. The results from the medical providers were then compared to the performance of the AI application by looking at the sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV). Statistical significance was determined with a 1 sample t test run through RStudio (Posit Software, PBC), and P<.05 was considered significant.
Results
The AI application performed extremely well in differentiating between benign nevi and malignant melanomas, with a sensitivity of 80%, specificity of 95%, accuracy of 92%, PPV of 80%, and NPV of 95% (Table 1). When compared with practicing medical providers, the AI performed significantly better in almost all categories (P<.05)(Figure 1). With all medical providers combined, the AI had significantly higher accuracy, sensitivity, and specificity (P<.05). The accuracy of the individual medical providers ranged from 32% to 78%.
Compared with dermatologists, the AI was significantly more specific and accurate and demonstrated a higher PPV and NPV (P<.05). There was no significant difference between the AI and dermatologists in sensitivity or labeling the true malignant lesions as malignant. The dermatologists who participated had been practicing from 1.5 years to 44 years, with an average of 16 years of dermatologic experience. There was no correlation between years practicing and performance in determining the malignancy of lesions. Of 14 dermatologists, dermoscopy was used daily by 10 and occasionally by 3, but only 6 dermatologists had any formal training. Dermatologists who used dermoscopy averaged 11 years of use.
The AI also performed significantly better than the primary care providers, including both family physicians and mid-level providers (P<.05). With the family physicians and mid-level provider scores combined, the AI showed a statistically significantly better performance in all categories examined, including sensitivity, specificity, accuracy, PPV, and NPV (P<.05). However, when compared with family physicians alone, the AI did not demonstrate a statistically significant difference in sensitivity.