Meeting ID
5422-21
Series ID
2021
Display Conference Events In Series
Tier-1 Meeting
Allow Teaser Image

X-ray vision: Using AI to maximize the value of radiographic images

Article Type
Changed
Tue, 02/16/2021 - 15:18

Artificial intelligence (AI) is expected to one day affect the entire continuum of cancer care – from screening and risk prediction to diagnosis, risk stratification, treatment selection, and follow-up, according to an expert in the field.

Lyss_Alan_MO_new_web.jpg
Dr. Alan P. Lyss

Hugo J.W.L. Aerts, PhD, director of the AI in Medicine Program at Brigham and Women’s Hospital in Boston, described studies using AI for some of these purposes during a presentation at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-06).

In one study, Dr. Aerts and colleagues set out to determine whether a convolutional neural network (CNN) could extract prognostic information from chest radiographs. The researchers tested this theory using patients from two trials – the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and the National Lung Screening Trial (NLST).

The team developed a CNN, called CXR-risk, and tested whether it could predict the longevity and prognosis of patients in the PLCO (n = 52,320) and NLST (n = 5,493) trials over a 12-year time period, based only on chest radiographs. No clinical information, demographics, radiographic interpretations, duration of follow-up, or censoring were provided to the deep-learning system.

CXR-risk output was stratified into five categories of radiographic risk scores for probability of death, from 0 (very low likelihood of mortality) to 1 (very high likelihood of mortality).

The investigators found a graded association between radiographic risk score and mortality. The very-high-risk group had mortality rates of 53.0% (PLCO) and 33.9% (NLST). In both trials, this was significantly higher than for the very-low-risk group. The unadjusted hazard ratio was 18.3 in the PCLO data set and 15.2 in the NLST data set (P < .001 for both).

This association was maintained after adjustment for radiologists’ findings (e.g., a lung nodule) and risk factors such as age, gender, and comorbid illnesses like diabetes. The adjusted HR was 4.8 in the PCLO data set and 7.0 in the NLST data set (P < .001 for both).

In both data sets, individuals in the very-high-risk group were significantly more likely to die of lung cancer. The aHR was 11.1 in the PCLO data set and 8.4 in the NSLT data set (P < .001 for both).

This might be expected for people who were interested in being screened for lung cancer. However, patients in the very-high-risk group were also more likely to die of cardiovascular illness (aHR, 3.6 for PLCO and 47.8 for NSLT; P < .001 for both) and respiratory illness (aHR, 27.5 for PLCO and 31.9 for NLST; P ≤ .001 for both).

With this information, a clinician could initiate additional testing and/or utilize more aggressive surveillance measures. If an oncologist considered therapy for a patient with newly diagnosed cancer, treatment choices and stratification for adverse events would be more intelligently planned.
 

Using AI to predict the risk of lung cancer

In another study, Dr. Aerts and colleagues developed and validated a CNN called CXR-LC, which was based on CXR-risk. The goal of this study was to see if CXR-LC could predict long-term incident lung cancer using data available in the EHR, including chest radiographs, age, sex, and smoking status.

The CXR-LC model was developed using data from the PLCO trial (n = 41,856) and was validated in smokers from the PLCO trial (n = 5,615; 12-year follow-up) as well as heavy smokers from the NLST trial (n = 5,493; 6-year follow-up).

Results showed that CXR-LC was able to predict which patients were at highest risk for developing lung cancer.

CXR-LC had better discrimination for incident lung cancer than did Medicare eligibility in the PLCO data set (area under the curve, 0.755 vs. 0.634; P < .001). And the performance of CXR-LC was similar to that of the PLCOM2012 risk score in both the PLCO data set (AUC, 0.755 vs. 0.751) and the NLST data set (AUC, 0.659 vs. 0.650).

When they were compared in screening populations of equal size, CXR-LC was more sensitive than Medicare eligibility criteria in the PLCO data set (74.9% vs. 63.8%; P = .012) and missed 30.7% fewer incident lung cancer diagnoses.
 

AI as a substitute for specialized testing and consultation

In a third study, Dr. Aerts and colleagues used a CNN to predict cardiovascular risk by assessing coronary artery calcium (CAC) from clinically obtained, readily available CT scans.

Ordinarily, identifying CAC – an accurate predictor of cardiovascular events – requires specialized expertise (manual measurement and cardiologist interpretation), time (estimated at 20 minutes/scan), and equipment (ECG-gated cardiac CT scan and special software).

In this study, the researchers used a fully end-to-end automated system with analytic time measured in less than 2 seconds.

The team trained and tuned their CNN using the Framingham Heart Study Offspring and Third Generation cohorts (n = 1,636), which included asymptomatic patients with high-quality, cardiac-gated CT scans for CAC quantification.

The researchers then tested the CNN on two asymptomatic and two symptomatic cohorts:

  • Asymptomatic Framingham Heart Study participants (n = 663) in whom the outcome measures were cardiovascular disease and death.
  • Asymptomatic NLST participants (n = 14,959) in whom the outcome measure was atherosclerotic cardiovascular death.
  • Symptomatic PROMISE study participants with stable chest pain (n = 4,021) in whom the outcome measures were all-cause mortality, MI, and hospitalization for unstable angina.
  • Symptomatic ROMICAT-II study patients with acute chest pain (n = 441) in whom the outcome measure was acute coronary syndrome at 28 days.

Among 5,521 subjects across all testing cohorts with cardiac-gated and nongated chest CT scans, the CNN and expert reader interpretations agreed on the CAC risk scores with a high level of concordance (kappa, 0.71; concordance rate, 0.79).

There was a very high Spearman’s correlation of 0.92 (P < .0001) and substantial agreement between automatically and manually calculated CAC risk groups, substantiating robust risk prediction for cardiovascular disease across multiple clinical scenarios.

Dr. Aerts commented that, among the NLST participants who had the highest risk of developing lung cancer, the risk of cardiovascular death was as high as the risk of death from lung cancer.
 

 

 

Using AI to assess patient outcomes

In an unpublished study, Dr. Aerts and colleagues used AI in an attempt to determine whether changes in measurements of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and skeletal muscle mass would provide clues about treatment outcomes in lung cancer patients.

The researchers developed a deep learning model using data from 1,129 patients at Massachusetts General and Brigham and Women’s Hospitals, measuring SAT, VAT, and muscle mass. The team applied the measurement system to a population of 12,128 outpatients and calculated z scores for SAT, VAT, and muscle mass to determine “normal” values.

When they applied the norms to surgical lung cancer data sets from the Boston Lung Cancer Study (n = 437) and TRACERx study (n = 394), the researchers found that smokers had lower adiposity and lower muscle mass than never-smokers.

More importantly, over time, among lung cancer patients who lost greater than 5% of VAT, SAT, and muscle mass, those patients with the greatest SAT loss (P < .0001) or VAT loss (P = .0015) had the lowest lung cancer–specific survival in the TRACERx study. There was no significant impairment of lung cancer-specific survival for patients who experienced skeletal muscle loss (P = .23).

The same observation was made for overall survival among patients enrolled in the Boston Lung Cancer Study, using the 5% threshold. Overall survival was significantly worse with increasing VAT loss (P = .0023) and SAT loss (P = .0082) but not with increasing skeletal muscle loss (P = .3).

The investigators speculated about whether the correlation between body composition and clinical outcome could yield clues about tumor biology. To test this, the researchers used the RNA sequencing–based ORACLE risk score in lung cancer patients from TRACERx. There was a high correlation between higher ORACLE risk scores and lower VAT and SAT, suggesting that measures of adiposity on CT were reflected in tumor biology patterns on an RNA level in lung cancer patients. There was no such correlation between ORACLE risk scores and skeletal muscle mass.
 

Wonderment ... tempered by concern and challenges

AI has awe-inspiring potential to yield actionable and prognostically important information from data mining the EHR and extracting the vast quantities of information from images. In some cases (like CAC), it is information that is “hiding in plain sight.” However, Dr. Aerts expressed several cautions, some of which have already plagued AI.

He referenced the Gartner Hype Cycle, which provides a graphic representation of five phases in the life cycle of emerging technologies. The “innovation trigger” is followed by a “peak of inflated expectations,” a “trough of disillusionment,” a “slope of enlightenment,” and a “plateau of productivity.”

Dr. Aerts noted that, in recent years, AI has seemed to fall into the trough of disillusionment, but it may be entering the slope of enlightenment on the way to the plateau of productivity.

His research highlighted several examples of productivity in radiomics in cancer patients and those who are at high risk of developing cancer.

In Dr. Aerts’s opinion, a second concern is replication of AI research results. He noted that, among 400 published studies, only 6% of authors shared the codes that would enable their findings to be corroborated. About 30% shared test data, and 54% shared “pseudocodes,” but transparency and reproducibility are problems for the acceptance and broad implementation of AI.

Dr. Aerts endorsed the Modelhub initiative (www.modelhub.ai), a multi-institutional initiative to advance reproducibility in the AI field and advance its full potential.

However, there are additional concerns about the implementation of radiomics and, more generally, data mining from clinicians’ EHRs to personalize care.

Firstly, it may be laborious and difficult to explain complex, computer-based risk stratification models to patients. Hereditary cancer testing is an example of a risk assessment test that requires complicated explanations that many clinicians relegate to genetics counselors – when patients elect to see them. When a model is not explainable, it undermines the confidence of patients and their care providers, according to an editorial related to the CXR-LC study.

Another issue is that uptake of lung cancer screening, in practice, has been underutilized by individuals who meet current, relatively straightforward Medicare criteria. Despite the apparently better accuracy of the CXR-LC deep-learning model, its complexity and limited access could constitute an additional barrier for the at-risk individuals who should avail themselves of screening.

Furthermore, although age and gender are accurate in most circumstances, there is legitimate concern about the accuracy of, for example, smoking history data and comorbid conditions in current EHRs. Who performs the laborious curation of the input in an AI model to assure its accuracy for individual patients?

Finally, it is unclear how scalable and applicable AI will be to medically underserved populations (e.g., smaller, community-based, free-standing, socioeconomically disadvantaged or rural health care institutions). There are substantial initial and maintenance costs that may limit AI’s availability to some academic institutions and large health maintenance organizations.

As the concerns and challenges are addressed, it will be interesting to see where and when the plateau of productivity for AI in cancer care occurs. When it does, many cancer patients will benefit from enhanced care along the continuum of the complex disease they and their caregivers seek to master.

Dr. Aerts disclosed relationships with Onc.AI outside the presented work.

Dr. Lyss was a community-based medical oncologist and clinical researcher for more than 35 years before his recent retirement. His clinical and research interests were focused on breast and lung cancers, as well as expanding clinical trial access to medically underserved populations. He is based in St. Louis. He has no conflicts of interest.

Meeting/Event
Publications
Topics
Sections
Meeting/Event
Meeting/Event

Artificial intelligence (AI) is expected to one day affect the entire continuum of cancer care – from screening and risk prediction to diagnosis, risk stratification, treatment selection, and follow-up, according to an expert in the field.

Lyss_Alan_MO_new_web.jpg
Dr. Alan P. Lyss

Hugo J.W.L. Aerts, PhD, director of the AI in Medicine Program at Brigham and Women’s Hospital in Boston, described studies using AI for some of these purposes during a presentation at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-06).

In one study, Dr. Aerts and colleagues set out to determine whether a convolutional neural network (CNN) could extract prognostic information from chest radiographs. The researchers tested this theory using patients from two trials – the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and the National Lung Screening Trial (NLST).

The team developed a CNN, called CXR-risk, and tested whether it could predict the longevity and prognosis of patients in the PLCO (n = 52,320) and NLST (n = 5,493) trials over a 12-year time period, based only on chest radiographs. No clinical information, demographics, radiographic interpretations, duration of follow-up, or censoring were provided to the deep-learning system.

CXR-risk output was stratified into five categories of radiographic risk scores for probability of death, from 0 (very low likelihood of mortality) to 1 (very high likelihood of mortality).

The investigators found a graded association between radiographic risk score and mortality. The very-high-risk group had mortality rates of 53.0% (PLCO) and 33.9% (NLST). In both trials, this was significantly higher than for the very-low-risk group. The unadjusted hazard ratio was 18.3 in the PCLO data set and 15.2 in the NLST data set (P < .001 for both).

This association was maintained after adjustment for radiologists’ findings (e.g., a lung nodule) and risk factors such as age, gender, and comorbid illnesses like diabetes. The adjusted HR was 4.8 in the PCLO data set and 7.0 in the NLST data set (P < .001 for both).

In both data sets, individuals in the very-high-risk group were significantly more likely to die of lung cancer. The aHR was 11.1 in the PCLO data set and 8.4 in the NSLT data set (P < .001 for both).

This might be expected for people who were interested in being screened for lung cancer. However, patients in the very-high-risk group were also more likely to die of cardiovascular illness (aHR, 3.6 for PLCO and 47.8 for NSLT; P < .001 for both) and respiratory illness (aHR, 27.5 for PLCO and 31.9 for NLST; P ≤ .001 for both).

With this information, a clinician could initiate additional testing and/or utilize more aggressive surveillance measures. If an oncologist considered therapy for a patient with newly diagnosed cancer, treatment choices and stratification for adverse events would be more intelligently planned.
 

Using AI to predict the risk of lung cancer

In another study, Dr. Aerts and colleagues developed and validated a CNN called CXR-LC, which was based on CXR-risk. The goal of this study was to see if CXR-LC could predict long-term incident lung cancer using data available in the EHR, including chest radiographs, age, sex, and smoking status.

The CXR-LC model was developed using data from the PLCO trial (n = 41,856) and was validated in smokers from the PLCO trial (n = 5,615; 12-year follow-up) as well as heavy smokers from the NLST trial (n = 5,493; 6-year follow-up).

Results showed that CXR-LC was able to predict which patients were at highest risk for developing lung cancer.

CXR-LC had better discrimination for incident lung cancer than did Medicare eligibility in the PLCO data set (area under the curve, 0.755 vs. 0.634; P < .001). And the performance of CXR-LC was similar to that of the PLCOM2012 risk score in both the PLCO data set (AUC, 0.755 vs. 0.751) and the NLST data set (AUC, 0.659 vs. 0.650).

When they were compared in screening populations of equal size, CXR-LC was more sensitive than Medicare eligibility criteria in the PLCO data set (74.9% vs. 63.8%; P = .012) and missed 30.7% fewer incident lung cancer diagnoses.
 

AI as a substitute for specialized testing and consultation

In a third study, Dr. Aerts and colleagues used a CNN to predict cardiovascular risk by assessing coronary artery calcium (CAC) from clinically obtained, readily available CT scans.

Ordinarily, identifying CAC – an accurate predictor of cardiovascular events – requires specialized expertise (manual measurement and cardiologist interpretation), time (estimated at 20 minutes/scan), and equipment (ECG-gated cardiac CT scan and special software).

In this study, the researchers used a fully end-to-end automated system with analytic time measured in less than 2 seconds.

The team trained and tuned their CNN using the Framingham Heart Study Offspring and Third Generation cohorts (n = 1,636), which included asymptomatic patients with high-quality, cardiac-gated CT scans for CAC quantification.

The researchers then tested the CNN on two asymptomatic and two symptomatic cohorts:

  • Asymptomatic Framingham Heart Study participants (n = 663) in whom the outcome measures were cardiovascular disease and death.
  • Asymptomatic NLST participants (n = 14,959) in whom the outcome measure was atherosclerotic cardiovascular death.
  • Symptomatic PROMISE study participants with stable chest pain (n = 4,021) in whom the outcome measures were all-cause mortality, MI, and hospitalization for unstable angina.
  • Symptomatic ROMICAT-II study patients with acute chest pain (n = 441) in whom the outcome measure was acute coronary syndrome at 28 days.

Among 5,521 subjects across all testing cohorts with cardiac-gated and nongated chest CT scans, the CNN and expert reader interpretations agreed on the CAC risk scores with a high level of concordance (kappa, 0.71; concordance rate, 0.79).

There was a very high Spearman’s correlation of 0.92 (P < .0001) and substantial agreement between automatically and manually calculated CAC risk groups, substantiating robust risk prediction for cardiovascular disease across multiple clinical scenarios.

Dr. Aerts commented that, among the NLST participants who had the highest risk of developing lung cancer, the risk of cardiovascular death was as high as the risk of death from lung cancer.
 

 

 

Using AI to assess patient outcomes

In an unpublished study, Dr. Aerts and colleagues used AI in an attempt to determine whether changes in measurements of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and skeletal muscle mass would provide clues about treatment outcomes in lung cancer patients.

The researchers developed a deep learning model using data from 1,129 patients at Massachusetts General and Brigham and Women’s Hospitals, measuring SAT, VAT, and muscle mass. The team applied the measurement system to a population of 12,128 outpatients and calculated z scores for SAT, VAT, and muscle mass to determine “normal” values.

When they applied the norms to surgical lung cancer data sets from the Boston Lung Cancer Study (n = 437) and TRACERx study (n = 394), the researchers found that smokers had lower adiposity and lower muscle mass than never-smokers.

More importantly, over time, among lung cancer patients who lost greater than 5% of VAT, SAT, and muscle mass, those patients with the greatest SAT loss (P < .0001) or VAT loss (P = .0015) had the lowest lung cancer–specific survival in the TRACERx study. There was no significant impairment of lung cancer-specific survival for patients who experienced skeletal muscle loss (P = .23).

The same observation was made for overall survival among patients enrolled in the Boston Lung Cancer Study, using the 5% threshold. Overall survival was significantly worse with increasing VAT loss (P = .0023) and SAT loss (P = .0082) but not with increasing skeletal muscle loss (P = .3).

The investigators speculated about whether the correlation between body composition and clinical outcome could yield clues about tumor biology. To test this, the researchers used the RNA sequencing–based ORACLE risk score in lung cancer patients from TRACERx. There was a high correlation between higher ORACLE risk scores and lower VAT and SAT, suggesting that measures of adiposity on CT were reflected in tumor biology patterns on an RNA level in lung cancer patients. There was no such correlation between ORACLE risk scores and skeletal muscle mass.
 

Wonderment ... tempered by concern and challenges

AI has awe-inspiring potential to yield actionable and prognostically important information from data mining the EHR and extracting the vast quantities of information from images. In some cases (like CAC), it is information that is “hiding in plain sight.” However, Dr. Aerts expressed several cautions, some of which have already plagued AI.

He referenced the Gartner Hype Cycle, which provides a graphic representation of five phases in the life cycle of emerging technologies. The “innovation trigger” is followed by a “peak of inflated expectations,” a “trough of disillusionment,” a “slope of enlightenment,” and a “plateau of productivity.”

Dr. Aerts noted that, in recent years, AI has seemed to fall into the trough of disillusionment, but it may be entering the slope of enlightenment on the way to the plateau of productivity.

His research highlighted several examples of productivity in radiomics in cancer patients and those who are at high risk of developing cancer.

In Dr. Aerts’s opinion, a second concern is replication of AI research results. He noted that, among 400 published studies, only 6% of authors shared the codes that would enable their findings to be corroborated. About 30% shared test data, and 54% shared “pseudocodes,” but transparency and reproducibility are problems for the acceptance and broad implementation of AI.

Dr. Aerts endorsed the Modelhub initiative (www.modelhub.ai), a multi-institutional initiative to advance reproducibility in the AI field and advance its full potential.

However, there are additional concerns about the implementation of radiomics and, more generally, data mining from clinicians’ EHRs to personalize care.

Firstly, it may be laborious and difficult to explain complex, computer-based risk stratification models to patients. Hereditary cancer testing is an example of a risk assessment test that requires complicated explanations that many clinicians relegate to genetics counselors – when patients elect to see them. When a model is not explainable, it undermines the confidence of patients and their care providers, according to an editorial related to the CXR-LC study.

Another issue is that uptake of lung cancer screening, in practice, has been underutilized by individuals who meet current, relatively straightforward Medicare criteria. Despite the apparently better accuracy of the CXR-LC deep-learning model, its complexity and limited access could constitute an additional barrier for the at-risk individuals who should avail themselves of screening.

Furthermore, although age and gender are accurate in most circumstances, there is legitimate concern about the accuracy of, for example, smoking history data and comorbid conditions in current EHRs. Who performs the laborious curation of the input in an AI model to assure its accuracy for individual patients?

Finally, it is unclear how scalable and applicable AI will be to medically underserved populations (e.g., smaller, community-based, free-standing, socioeconomically disadvantaged or rural health care institutions). There are substantial initial and maintenance costs that may limit AI’s availability to some academic institutions and large health maintenance organizations.

As the concerns and challenges are addressed, it will be interesting to see where and when the plateau of productivity for AI in cancer care occurs. When it does, many cancer patients will benefit from enhanced care along the continuum of the complex disease they and their caregivers seek to master.

Dr. Aerts disclosed relationships with Onc.AI outside the presented work.

Dr. Lyss was a community-based medical oncologist and clinical researcher for more than 35 years before his recent retirement. His clinical and research interests were focused on breast and lung cancers, as well as expanding clinical trial access to medically underserved populations. He is based in St. Louis. He has no conflicts of interest.

Artificial intelligence (AI) is expected to one day affect the entire continuum of cancer care – from screening and risk prediction to diagnosis, risk stratification, treatment selection, and follow-up, according to an expert in the field.

Lyss_Alan_MO_new_web.jpg
Dr. Alan P. Lyss

Hugo J.W.L. Aerts, PhD, director of the AI in Medicine Program at Brigham and Women’s Hospital in Boston, described studies using AI for some of these purposes during a presentation at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-06).

In one study, Dr. Aerts and colleagues set out to determine whether a convolutional neural network (CNN) could extract prognostic information from chest radiographs. The researchers tested this theory using patients from two trials – the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and the National Lung Screening Trial (NLST).

The team developed a CNN, called CXR-risk, and tested whether it could predict the longevity and prognosis of patients in the PLCO (n = 52,320) and NLST (n = 5,493) trials over a 12-year time period, based only on chest radiographs. No clinical information, demographics, radiographic interpretations, duration of follow-up, or censoring were provided to the deep-learning system.

CXR-risk output was stratified into five categories of radiographic risk scores for probability of death, from 0 (very low likelihood of mortality) to 1 (very high likelihood of mortality).

The investigators found a graded association between radiographic risk score and mortality. The very-high-risk group had mortality rates of 53.0% (PLCO) and 33.9% (NLST). In both trials, this was significantly higher than for the very-low-risk group. The unadjusted hazard ratio was 18.3 in the PCLO data set and 15.2 in the NLST data set (P < .001 for both).

This association was maintained after adjustment for radiologists’ findings (e.g., a lung nodule) and risk factors such as age, gender, and comorbid illnesses like diabetes. The adjusted HR was 4.8 in the PCLO data set and 7.0 in the NLST data set (P < .001 for both).

In both data sets, individuals in the very-high-risk group were significantly more likely to die of lung cancer. The aHR was 11.1 in the PCLO data set and 8.4 in the NSLT data set (P < .001 for both).

This might be expected for people who were interested in being screened for lung cancer. However, patients in the very-high-risk group were also more likely to die of cardiovascular illness (aHR, 3.6 for PLCO and 47.8 for NSLT; P < .001 for both) and respiratory illness (aHR, 27.5 for PLCO and 31.9 for NLST; P ≤ .001 for both).

With this information, a clinician could initiate additional testing and/or utilize more aggressive surveillance measures. If an oncologist considered therapy for a patient with newly diagnosed cancer, treatment choices and stratification for adverse events would be more intelligently planned.
 

Using AI to predict the risk of lung cancer

In another study, Dr. Aerts and colleagues developed and validated a CNN called CXR-LC, which was based on CXR-risk. The goal of this study was to see if CXR-LC could predict long-term incident lung cancer using data available in the EHR, including chest radiographs, age, sex, and smoking status.

The CXR-LC model was developed using data from the PLCO trial (n = 41,856) and was validated in smokers from the PLCO trial (n = 5,615; 12-year follow-up) as well as heavy smokers from the NLST trial (n = 5,493; 6-year follow-up).

Results showed that CXR-LC was able to predict which patients were at highest risk for developing lung cancer.

CXR-LC had better discrimination for incident lung cancer than did Medicare eligibility in the PLCO data set (area under the curve, 0.755 vs. 0.634; P < .001). And the performance of CXR-LC was similar to that of the PLCOM2012 risk score in both the PLCO data set (AUC, 0.755 vs. 0.751) and the NLST data set (AUC, 0.659 vs. 0.650).

When they were compared in screening populations of equal size, CXR-LC was more sensitive than Medicare eligibility criteria in the PLCO data set (74.9% vs. 63.8%; P = .012) and missed 30.7% fewer incident lung cancer diagnoses.
 

AI as a substitute for specialized testing and consultation

In a third study, Dr. Aerts and colleagues used a CNN to predict cardiovascular risk by assessing coronary artery calcium (CAC) from clinically obtained, readily available CT scans.

Ordinarily, identifying CAC – an accurate predictor of cardiovascular events – requires specialized expertise (manual measurement and cardiologist interpretation), time (estimated at 20 minutes/scan), and equipment (ECG-gated cardiac CT scan and special software).

In this study, the researchers used a fully end-to-end automated system with analytic time measured in less than 2 seconds.

The team trained and tuned their CNN using the Framingham Heart Study Offspring and Third Generation cohorts (n = 1,636), which included asymptomatic patients with high-quality, cardiac-gated CT scans for CAC quantification.

The researchers then tested the CNN on two asymptomatic and two symptomatic cohorts:

  • Asymptomatic Framingham Heart Study participants (n = 663) in whom the outcome measures were cardiovascular disease and death.
  • Asymptomatic NLST participants (n = 14,959) in whom the outcome measure was atherosclerotic cardiovascular death.
  • Symptomatic PROMISE study participants with stable chest pain (n = 4,021) in whom the outcome measures were all-cause mortality, MI, and hospitalization for unstable angina.
  • Symptomatic ROMICAT-II study patients with acute chest pain (n = 441) in whom the outcome measure was acute coronary syndrome at 28 days.

Among 5,521 subjects across all testing cohorts with cardiac-gated and nongated chest CT scans, the CNN and expert reader interpretations agreed on the CAC risk scores with a high level of concordance (kappa, 0.71; concordance rate, 0.79).

There was a very high Spearman’s correlation of 0.92 (P < .0001) and substantial agreement between automatically and manually calculated CAC risk groups, substantiating robust risk prediction for cardiovascular disease across multiple clinical scenarios.

Dr. Aerts commented that, among the NLST participants who had the highest risk of developing lung cancer, the risk of cardiovascular death was as high as the risk of death from lung cancer.
 

 

 

Using AI to assess patient outcomes

In an unpublished study, Dr. Aerts and colleagues used AI in an attempt to determine whether changes in measurements of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), and skeletal muscle mass would provide clues about treatment outcomes in lung cancer patients.

The researchers developed a deep learning model using data from 1,129 patients at Massachusetts General and Brigham and Women’s Hospitals, measuring SAT, VAT, and muscle mass. The team applied the measurement system to a population of 12,128 outpatients and calculated z scores for SAT, VAT, and muscle mass to determine “normal” values.

When they applied the norms to surgical lung cancer data sets from the Boston Lung Cancer Study (n = 437) and TRACERx study (n = 394), the researchers found that smokers had lower adiposity and lower muscle mass than never-smokers.

More importantly, over time, among lung cancer patients who lost greater than 5% of VAT, SAT, and muscle mass, those patients with the greatest SAT loss (P < .0001) or VAT loss (P = .0015) had the lowest lung cancer–specific survival in the TRACERx study. There was no significant impairment of lung cancer-specific survival for patients who experienced skeletal muscle loss (P = .23).

The same observation was made for overall survival among patients enrolled in the Boston Lung Cancer Study, using the 5% threshold. Overall survival was significantly worse with increasing VAT loss (P = .0023) and SAT loss (P = .0082) but not with increasing skeletal muscle loss (P = .3).

The investigators speculated about whether the correlation between body composition and clinical outcome could yield clues about tumor biology. To test this, the researchers used the RNA sequencing–based ORACLE risk score in lung cancer patients from TRACERx. There was a high correlation between higher ORACLE risk scores and lower VAT and SAT, suggesting that measures of adiposity on CT were reflected in tumor biology patterns on an RNA level in lung cancer patients. There was no such correlation between ORACLE risk scores and skeletal muscle mass.
 

Wonderment ... tempered by concern and challenges

AI has awe-inspiring potential to yield actionable and prognostically important information from data mining the EHR and extracting the vast quantities of information from images. In some cases (like CAC), it is information that is “hiding in plain sight.” However, Dr. Aerts expressed several cautions, some of which have already plagued AI.

He referenced the Gartner Hype Cycle, which provides a graphic representation of five phases in the life cycle of emerging technologies. The “innovation trigger” is followed by a “peak of inflated expectations,” a “trough of disillusionment,” a “slope of enlightenment,” and a “plateau of productivity.”

Dr. Aerts noted that, in recent years, AI has seemed to fall into the trough of disillusionment, but it may be entering the slope of enlightenment on the way to the plateau of productivity.

His research highlighted several examples of productivity in radiomics in cancer patients and those who are at high risk of developing cancer.

In Dr. Aerts’s opinion, a second concern is replication of AI research results. He noted that, among 400 published studies, only 6% of authors shared the codes that would enable their findings to be corroborated. About 30% shared test data, and 54% shared “pseudocodes,” but transparency and reproducibility are problems for the acceptance and broad implementation of AI.

Dr. Aerts endorsed the Modelhub initiative (www.modelhub.ai), a multi-institutional initiative to advance reproducibility in the AI field and advance its full potential.

However, there are additional concerns about the implementation of radiomics and, more generally, data mining from clinicians’ EHRs to personalize care.

Firstly, it may be laborious and difficult to explain complex, computer-based risk stratification models to patients. Hereditary cancer testing is an example of a risk assessment test that requires complicated explanations that many clinicians relegate to genetics counselors – when patients elect to see them. When a model is not explainable, it undermines the confidence of patients and their care providers, according to an editorial related to the CXR-LC study.

Another issue is that uptake of lung cancer screening, in practice, has been underutilized by individuals who meet current, relatively straightforward Medicare criteria. Despite the apparently better accuracy of the CXR-LC deep-learning model, its complexity and limited access could constitute an additional barrier for the at-risk individuals who should avail themselves of screening.

Furthermore, although age and gender are accurate in most circumstances, there is legitimate concern about the accuracy of, for example, smoking history data and comorbid conditions in current EHRs. Who performs the laborious curation of the input in an AI model to assure its accuracy for individual patients?

Finally, it is unclear how scalable and applicable AI will be to medically underserved populations (e.g., smaller, community-based, free-standing, socioeconomically disadvantaged or rural health care institutions). There are substantial initial and maintenance costs that may limit AI’s availability to some academic institutions and large health maintenance organizations.

As the concerns and challenges are addressed, it will be interesting to see where and when the plateau of productivity for AI in cancer care occurs. When it does, many cancer patients will benefit from enhanced care along the continuum of the complex disease they and their caregivers seek to master.

Dr. Aerts disclosed relationships with Onc.AI outside the presented work.

Dr. Lyss was a community-based medical oncologist and clinical researcher for more than 35 years before his recent retirement. His clinical and research interests were focused on breast and lung cancers, as well as expanding clinical trial access to medically underserved populations. He is based in St. Louis. He has no conflicts of interest.

Publications
Publications
Topics
Article Type
Sections
Article Source

FROM AACR: AI, DIAGNOSIS, AND IMAGING 2021

Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article
Display survey writer

EHR data harnessed to spot new risk factors for early-onset CRC

Article Type
Changed
Fri, 02/05/2021 - 08:59

Machine learning models that use routine data present in the electronic health record have identified new risk factors for early-onset colorectal cancer (CRC), according to a new study.

Quillen_Michael_FL_web.jpg
Michael B. Quillen

The models found that hypertension, cough, and asthma, among other factors, were important in explaining the risk of early-onset CRC. For some factors, associations emerged up to 5 years before diagnosis.

These findings were reported at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract PR-10).

“The incidence of early-onset CRC has been rising 2% annually since 1994,” noted Michael B. Quillen, one of the study authors and a medical student at the University of Florida, Gainesville.

Inherited genetic syndromes and predisposing conditions such as inflammatory bowel disease account for about half of cases in this age group, but factors explaining the other half remain a mystery.

To shed light in this area, the investigators undertook a study of patients aged 50 years or younger from the OneFlorida Clinical Research Consortium who had at least 2 years of EHR data. This included 783 cases with CRC and 8,981 incidence density-matched controls, with both groups having a mean age of 36 years.

The patients were split into colon cancer and rectal cancer cohorts, and then further divided into four prediction windows, Mr. Quillen explained. Each prediction window started with the patient’s first recorded encounter date in the EHR and ended at 0, 1, 3, or 5 years before the date of diagnosis.

The investigators used machine-learning models to determine what features (e.g., diagnoses, procedures, demographics) were important in determining risk.

Results were expressed in charts that ranked the features by their SHAP (Shapley Additive Explanations) values, which reflect the average impact of a feature on the magnitude of model output.
 

Results: Top models and features

The top-performing models had areas under the curve of 0.61-0.75 for colon cancer risk, and 0.62-0.73 for rectal cancer risk, reported T. Maxwell Parker, another study author and medical student at the University of Florida, Gainesville.

Parker_Maxwell__FL_web.jpg
T. Maxwell Parker

For colon cancer, the top features for the 0-year cohort included some highly specific symptoms that would be expected in patients close to the diagnostic date: abdominal pain, anemia, blood in the stool, and various procedures such as CT scans. “These do not need a machine learning algorithm to identify,” Mr. Parker acknowledged.

However, there were also two noteworthy features present – cough and primary hypertension – that became the top features in the 1-year and 3-year cohorts, then dropped out in the 5-year cohort.

Other features that became important moving farther out from the diagnostic date of colon cancer, across the windows studied, were chronic sinusitis, atopic dermatitis, asthma, and upper-respiratory infection.

For rectal cancer, some previously identified factors – immune conditions related to infectious disease (HIV and anogenital warts associated with human papillomavirus) as well as amoxicillin therapy – were prominent in the 0-year cohort and became increasingly important going farther out from the diagnostic date.

Obesity was the top feature in the 3-year cohort, and asthma became important in that cohort as well.

None of the rectal cancer models tested performed well at identifying important features in the 5-year cohort.

The investigators are exploring hypotheses to explain how the identified features, especially the new ones such as hypertension and cough, might contribute to CRC carcinogenesis in young adults, according to Mr. Parker. As inclusion of older patients could confound associations, research restricted to those aged 50 years and younger may be necessary.

“We would like to validate these model findings in a second independent data set, and if they are validated, we would consider a prospective cohort study with those features,” Mr. Parker said. The team also plans to refine the models with the aim of improving their areas under the curve.

Thereafter, the team hopes to explore ways for implementing the findings clinically to support screening, which will require consideration of the context, Mr. Parker concluded. “Should we use high-sensitivity or low-specificity models for screening, or do we use the balance of both? Also, different models may be suitable for different situations,” he said.

Mr. Parker and Mr. Quillen disclosed no conflicts of interest. The study did not receive specific funding.

Meeting/Event
Publications
Topics
Sections
Meeting/Event
Meeting/Event

Machine learning models that use routine data present in the electronic health record have identified new risk factors for early-onset colorectal cancer (CRC), according to a new study.

Quillen_Michael_FL_web.jpg
Michael B. Quillen

The models found that hypertension, cough, and asthma, among other factors, were important in explaining the risk of early-onset CRC. For some factors, associations emerged up to 5 years before diagnosis.

These findings were reported at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract PR-10).

“The incidence of early-onset CRC has been rising 2% annually since 1994,” noted Michael B. Quillen, one of the study authors and a medical student at the University of Florida, Gainesville.

Inherited genetic syndromes and predisposing conditions such as inflammatory bowel disease account for about half of cases in this age group, but factors explaining the other half remain a mystery.

To shed light in this area, the investigators undertook a study of patients aged 50 years or younger from the OneFlorida Clinical Research Consortium who had at least 2 years of EHR data. This included 783 cases with CRC and 8,981 incidence density-matched controls, with both groups having a mean age of 36 years.

The patients were split into colon cancer and rectal cancer cohorts, and then further divided into four prediction windows, Mr. Quillen explained. Each prediction window started with the patient’s first recorded encounter date in the EHR and ended at 0, 1, 3, or 5 years before the date of diagnosis.

The investigators used machine-learning models to determine what features (e.g., diagnoses, procedures, demographics) were important in determining risk.

Results were expressed in charts that ranked the features by their SHAP (Shapley Additive Explanations) values, which reflect the average impact of a feature on the magnitude of model output.
 

Results: Top models and features

The top-performing models had areas under the curve of 0.61-0.75 for colon cancer risk, and 0.62-0.73 for rectal cancer risk, reported T. Maxwell Parker, another study author and medical student at the University of Florida, Gainesville.

Parker_Maxwell__FL_web.jpg
T. Maxwell Parker

For colon cancer, the top features for the 0-year cohort included some highly specific symptoms that would be expected in patients close to the diagnostic date: abdominal pain, anemia, blood in the stool, and various procedures such as CT scans. “These do not need a machine learning algorithm to identify,” Mr. Parker acknowledged.

However, there were also two noteworthy features present – cough and primary hypertension – that became the top features in the 1-year and 3-year cohorts, then dropped out in the 5-year cohort.

Other features that became important moving farther out from the diagnostic date of colon cancer, across the windows studied, were chronic sinusitis, atopic dermatitis, asthma, and upper-respiratory infection.

For rectal cancer, some previously identified factors – immune conditions related to infectious disease (HIV and anogenital warts associated with human papillomavirus) as well as amoxicillin therapy – were prominent in the 0-year cohort and became increasingly important going farther out from the diagnostic date.

Obesity was the top feature in the 3-year cohort, and asthma became important in that cohort as well.

None of the rectal cancer models tested performed well at identifying important features in the 5-year cohort.

The investigators are exploring hypotheses to explain how the identified features, especially the new ones such as hypertension and cough, might contribute to CRC carcinogenesis in young adults, according to Mr. Parker. As inclusion of older patients could confound associations, research restricted to those aged 50 years and younger may be necessary.

“We would like to validate these model findings in a second independent data set, and if they are validated, we would consider a prospective cohort study with those features,” Mr. Parker said. The team also plans to refine the models with the aim of improving their areas under the curve.

Thereafter, the team hopes to explore ways for implementing the findings clinically to support screening, which will require consideration of the context, Mr. Parker concluded. “Should we use high-sensitivity or low-specificity models for screening, or do we use the balance of both? Also, different models may be suitable for different situations,” he said.

Mr. Parker and Mr. Quillen disclosed no conflicts of interest. The study did not receive specific funding.

Machine learning models that use routine data present in the electronic health record have identified new risk factors for early-onset colorectal cancer (CRC), according to a new study.

Quillen_Michael_FL_web.jpg
Michael B. Quillen

The models found that hypertension, cough, and asthma, among other factors, were important in explaining the risk of early-onset CRC. For some factors, associations emerged up to 5 years before diagnosis.

These findings were reported at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract PR-10).

“The incidence of early-onset CRC has been rising 2% annually since 1994,” noted Michael B. Quillen, one of the study authors and a medical student at the University of Florida, Gainesville.

Inherited genetic syndromes and predisposing conditions such as inflammatory bowel disease account for about half of cases in this age group, but factors explaining the other half remain a mystery.

To shed light in this area, the investigators undertook a study of patients aged 50 years or younger from the OneFlorida Clinical Research Consortium who had at least 2 years of EHR data. This included 783 cases with CRC and 8,981 incidence density-matched controls, with both groups having a mean age of 36 years.

The patients were split into colon cancer and rectal cancer cohorts, and then further divided into four prediction windows, Mr. Quillen explained. Each prediction window started with the patient’s first recorded encounter date in the EHR and ended at 0, 1, 3, or 5 years before the date of diagnosis.

The investigators used machine-learning models to determine what features (e.g., diagnoses, procedures, demographics) were important in determining risk.

Results were expressed in charts that ranked the features by their SHAP (Shapley Additive Explanations) values, which reflect the average impact of a feature on the magnitude of model output.
 

Results: Top models and features

The top-performing models had areas under the curve of 0.61-0.75 for colon cancer risk, and 0.62-0.73 for rectal cancer risk, reported T. Maxwell Parker, another study author and medical student at the University of Florida, Gainesville.

Parker_Maxwell__FL_web.jpg
T. Maxwell Parker

For colon cancer, the top features for the 0-year cohort included some highly specific symptoms that would be expected in patients close to the diagnostic date: abdominal pain, anemia, blood in the stool, and various procedures such as CT scans. “These do not need a machine learning algorithm to identify,” Mr. Parker acknowledged.

However, there were also two noteworthy features present – cough and primary hypertension – that became the top features in the 1-year and 3-year cohorts, then dropped out in the 5-year cohort.

Other features that became important moving farther out from the diagnostic date of colon cancer, across the windows studied, were chronic sinusitis, atopic dermatitis, asthma, and upper-respiratory infection.

For rectal cancer, some previously identified factors – immune conditions related to infectious disease (HIV and anogenital warts associated with human papillomavirus) as well as amoxicillin therapy – were prominent in the 0-year cohort and became increasingly important going farther out from the diagnostic date.

Obesity was the top feature in the 3-year cohort, and asthma became important in that cohort as well.

None of the rectal cancer models tested performed well at identifying important features in the 5-year cohort.

The investigators are exploring hypotheses to explain how the identified features, especially the new ones such as hypertension and cough, might contribute to CRC carcinogenesis in young adults, according to Mr. Parker. As inclusion of older patients could confound associations, research restricted to those aged 50 years and younger may be necessary.

“We would like to validate these model findings in a second independent data set, and if they are validated, we would consider a prospective cohort study with those features,” Mr. Parker said. The team also plans to refine the models with the aim of improving their areas under the curve.

Thereafter, the team hopes to explore ways for implementing the findings clinically to support screening, which will require consideration of the context, Mr. Parker concluded. “Should we use high-sensitivity or low-specificity models for screening, or do we use the balance of both? Also, different models may be suitable for different situations,” he said.

Mr. Parker and Mr. Quillen disclosed no conflicts of interest. The study did not receive specific funding.

Publications
Publications
Topics
Article Type
Sections
Article Source

FROM AACR: AI, DIAGNOSIS, AND IMAGING 2021

Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article

Test could help patients with pancreatic cysts avoid unneeded surgery

Article Type
Changed
Fri, 02/19/2021 - 16:24

A test that uses machine learning may improve the management of patients with pancreatic cysts, sparing some of them unnecessary surgery, a cohort study suggests.

Karchin_Rachel_MD_web.TIF
Dr. Rachel Karchin

The test, called CompCyst, integrates clinical, imaging, and biomarker data. It proved more accurate than the current standard of care for correctly determining whether patients should be discharged from follow-up, immediately operated on, or monitored.

Rachel Karchin, PhD, of the Johns Hopkins Whiting School of Engineering in Baltimore, reported these results at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-13).

“Preoperative diagnosis of pancreatic cysts and managing patients who present with a cyst are a clinical conundrum because pancreatic cancer is so deadly, while the decision to surgically resect a cyst is complicated by the danger of the surgery, which has high morbidity and mortality,” Dr. Karchin explained. “The challenge of the diagnostic test is to place patients into one of three groups: those who should be discharged, who should be operated on, and who should be monitored.”

High sensitivity is important for the operate and monitor groups to ensure identification of all patients needing these approaches, whereas higher specificity is important for the discharge group to avoid falsely classifying premalignant cysts, Dr. Karchin said.

She and her colleagues applied machine learning to this classification challenge, using data from 862 patients who had undergone resection of pancreatic cysts at 16 centers in the United States, Europe, and Asia. All patients had a known cyst histopathology, which served as the gold standard, and a known clinical management strategy (discharge, operate, or monitor).

The investigators used a multivariate organization of combinatorial alterations algorithm that integrates clinical features, imaging characteristics, cyst fluid genetics, and serum biomarkers to create classifiers. This algorithm can be trained to maximize sensitivity, maximize specificity, or balance these metrics, Dr. Karchin noted.

The resulting test, CompCyst, was trained using data from 436 of the patients and then validated in the remaining 426 patients.

In the validation cohort, for classifying patients who should be discharged from care, the test had a sensitivity of 46% and a specificity of 100%, according to results reported at the conference and published previously (Sci Transl Med. 2019 Jul 19. doi: 10.1126/scitranslmed.aav4772).

For immediately operating, CompCyst had a sensitivity of 91% and a specificity of 54%. And for monitoring the patient, the test had a sensitivity of 99% and a specificity of 30%.

When CompCyst was compared against the standard of care based on conventional clinical and imaging criteria alone, the former was more accurate. CompCyst correctly identified larger shares of patients who should have been discharged (60% vs. 19%) and who should have been monitored (49% vs. 34%), and the test identified a similar share of patients who should have immediately had an operation (91% vs. 89%).

“The takeaway from this is that standard of care is sending too many patients unnecessarily to surgery,” Dr. Karchin commented. “The CompCyst test, with application of the three classifiers sequentially – discharge, operate, or monitor – could reduce unnecessary surgery by 60% or more based on our calculations.”

“While our study was retrospective, it shows promising results in reducing unnecessary surgeries, compared to current standard of care,” she said, adding that a prospective study is planned next.

“In 10-12 weeks, this CompCyst diagnostic test is going to be available at Johns Hopkins for patients. I’m very excited about that,” Dr. Karchin concluded. “We hope that our study shows the potential of combining clinical, imaging, and genetic features with machine learning to improve clinical judgment about many diseases.”

Dr. Karchin disclosed no conflicts of interest. The study was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Virginia and D.K. Ludwig Fund for Cancer Research, the Sol Goldman Pancreatic Cancer Research Center, the Michael Rolfe Pancreatic Cancer Research Foundation, the Benjamin Baker Scholarship, and the National Institutes of Health.

Help your patients understand pancreatitis testing and treatment options, symptoms and complications by sharing AGA’s patient education from the GI Patient Center: www.gastro.org/pancreatitis.

Meeting/Event
Publications
Topics
Sections
Meeting/Event
Meeting/Event

A test that uses machine learning may improve the management of patients with pancreatic cysts, sparing some of them unnecessary surgery, a cohort study suggests.

Karchin_Rachel_MD_web.TIF
Dr. Rachel Karchin

The test, called CompCyst, integrates clinical, imaging, and biomarker data. It proved more accurate than the current standard of care for correctly determining whether patients should be discharged from follow-up, immediately operated on, or monitored.

Rachel Karchin, PhD, of the Johns Hopkins Whiting School of Engineering in Baltimore, reported these results at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-13).

“Preoperative diagnosis of pancreatic cysts and managing patients who present with a cyst are a clinical conundrum because pancreatic cancer is so deadly, while the decision to surgically resect a cyst is complicated by the danger of the surgery, which has high morbidity and mortality,” Dr. Karchin explained. “The challenge of the diagnostic test is to place patients into one of three groups: those who should be discharged, who should be operated on, and who should be monitored.”

High sensitivity is important for the operate and monitor groups to ensure identification of all patients needing these approaches, whereas higher specificity is important for the discharge group to avoid falsely classifying premalignant cysts, Dr. Karchin said.

She and her colleagues applied machine learning to this classification challenge, using data from 862 patients who had undergone resection of pancreatic cysts at 16 centers in the United States, Europe, and Asia. All patients had a known cyst histopathology, which served as the gold standard, and a known clinical management strategy (discharge, operate, or monitor).

The investigators used a multivariate organization of combinatorial alterations algorithm that integrates clinical features, imaging characteristics, cyst fluid genetics, and serum biomarkers to create classifiers. This algorithm can be trained to maximize sensitivity, maximize specificity, or balance these metrics, Dr. Karchin noted.

The resulting test, CompCyst, was trained using data from 436 of the patients and then validated in the remaining 426 patients.

In the validation cohort, for classifying patients who should be discharged from care, the test had a sensitivity of 46% and a specificity of 100%, according to results reported at the conference and published previously (Sci Transl Med. 2019 Jul 19. doi: 10.1126/scitranslmed.aav4772).

For immediately operating, CompCyst had a sensitivity of 91% and a specificity of 54%. And for monitoring the patient, the test had a sensitivity of 99% and a specificity of 30%.

When CompCyst was compared against the standard of care based on conventional clinical and imaging criteria alone, the former was more accurate. CompCyst correctly identified larger shares of patients who should have been discharged (60% vs. 19%) and who should have been monitored (49% vs. 34%), and the test identified a similar share of patients who should have immediately had an operation (91% vs. 89%).

“The takeaway from this is that standard of care is sending too many patients unnecessarily to surgery,” Dr. Karchin commented. “The CompCyst test, with application of the three classifiers sequentially – discharge, operate, or monitor – could reduce unnecessary surgery by 60% or more based on our calculations.”

“While our study was retrospective, it shows promising results in reducing unnecessary surgeries, compared to current standard of care,” she said, adding that a prospective study is planned next.

“In 10-12 weeks, this CompCyst diagnostic test is going to be available at Johns Hopkins for patients. I’m very excited about that,” Dr. Karchin concluded. “We hope that our study shows the potential of combining clinical, imaging, and genetic features with machine learning to improve clinical judgment about many diseases.”

Dr. Karchin disclosed no conflicts of interest. The study was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Virginia and D.K. Ludwig Fund for Cancer Research, the Sol Goldman Pancreatic Cancer Research Center, the Michael Rolfe Pancreatic Cancer Research Foundation, the Benjamin Baker Scholarship, and the National Institutes of Health.

Help your patients understand pancreatitis testing and treatment options, symptoms and complications by sharing AGA’s patient education from the GI Patient Center: www.gastro.org/pancreatitis.

A test that uses machine learning may improve the management of patients with pancreatic cysts, sparing some of them unnecessary surgery, a cohort study suggests.

Karchin_Rachel_MD_web.TIF
Dr. Rachel Karchin

The test, called CompCyst, integrates clinical, imaging, and biomarker data. It proved more accurate than the current standard of care for correctly determining whether patients should be discharged from follow-up, immediately operated on, or monitored.

Rachel Karchin, PhD, of the Johns Hopkins Whiting School of Engineering in Baltimore, reported these results at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-13).

“Preoperative diagnosis of pancreatic cysts and managing patients who present with a cyst are a clinical conundrum because pancreatic cancer is so deadly, while the decision to surgically resect a cyst is complicated by the danger of the surgery, which has high morbidity and mortality,” Dr. Karchin explained. “The challenge of the diagnostic test is to place patients into one of three groups: those who should be discharged, who should be operated on, and who should be monitored.”

High sensitivity is important for the operate and monitor groups to ensure identification of all patients needing these approaches, whereas higher specificity is important for the discharge group to avoid falsely classifying premalignant cysts, Dr. Karchin said.

She and her colleagues applied machine learning to this classification challenge, using data from 862 patients who had undergone resection of pancreatic cysts at 16 centers in the United States, Europe, and Asia. All patients had a known cyst histopathology, which served as the gold standard, and a known clinical management strategy (discharge, operate, or monitor).

The investigators used a multivariate organization of combinatorial alterations algorithm that integrates clinical features, imaging characteristics, cyst fluid genetics, and serum biomarkers to create classifiers. This algorithm can be trained to maximize sensitivity, maximize specificity, or balance these metrics, Dr. Karchin noted.

The resulting test, CompCyst, was trained using data from 436 of the patients and then validated in the remaining 426 patients.

In the validation cohort, for classifying patients who should be discharged from care, the test had a sensitivity of 46% and a specificity of 100%, according to results reported at the conference and published previously (Sci Transl Med. 2019 Jul 19. doi: 10.1126/scitranslmed.aav4772).

For immediately operating, CompCyst had a sensitivity of 91% and a specificity of 54%. And for monitoring the patient, the test had a sensitivity of 99% and a specificity of 30%.

When CompCyst was compared against the standard of care based on conventional clinical and imaging criteria alone, the former was more accurate. CompCyst correctly identified larger shares of patients who should have been discharged (60% vs. 19%) and who should have been monitored (49% vs. 34%), and the test identified a similar share of patients who should have immediately had an operation (91% vs. 89%).

“The takeaway from this is that standard of care is sending too many patients unnecessarily to surgery,” Dr. Karchin commented. “The CompCyst test, with application of the three classifiers sequentially – discharge, operate, or monitor – could reduce unnecessary surgery by 60% or more based on our calculations.”

“While our study was retrospective, it shows promising results in reducing unnecessary surgeries, compared to current standard of care,” she said, adding that a prospective study is planned next.

“In 10-12 weeks, this CompCyst diagnostic test is going to be available at Johns Hopkins for patients. I’m very excited about that,” Dr. Karchin concluded. “We hope that our study shows the potential of combining clinical, imaging, and genetic features with machine learning to improve clinical judgment about many diseases.”

Dr. Karchin disclosed no conflicts of interest. The study was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Virginia and D.K. Ludwig Fund for Cancer Research, the Sol Goldman Pancreatic Cancer Research Center, the Michael Rolfe Pancreatic Cancer Research Foundation, the Benjamin Baker Scholarship, and the National Institutes of Health.

Help your patients understand pancreatitis testing and treatment options, symptoms and complications by sharing AGA’s patient education from the GI Patient Center: www.gastro.org/pancreatitis.

Publications
Publications
Topics
Article Type
Sections
Article Source

FROM AACR: AI, DIAGNOSIS, AND IMAGING 2021

Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article
Display survey writer

Model predicts acute kidney injury in cancer patients a month in advance

Article Type
Changed
Wed, 01/04/2023 - 16:41

A model that crunches data from routine blood tests can accurately identify cancer patients who will develop acute kidney injury (AKI) up to a month before it happens, according to a cohort study.

Scanlon_Lauren_A_UK_web.jpg
Dr. Lauren A. Scanlon

The algorithm spotted nearly 74% of the patients who went on to develop AKI within 30 days, providing a window for intervention and possibly prevention, according to investigators.

These results were reported at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (abstract PR-11).

“Cancer patients are a high-risk population for AKI due to the nature of their treatment and illness,” said presenter Lauren A. Scanlon, PhD, a data scientist at The Christie NHS Foundation Trust in Huddersfield, England. “AKI causes a huge disruption in treatment and distress for the patient, so it would be amazing if we could, say, predict the AKI before it occurs and prevent it from even happening.”

U.K. health care providers are already using an algorithm to monitor patients’ creatinine levels, comparing new values against historic ones, Dr. Scanlon explained. When that algorithm detects AKI, it issues an alert that triggers implementation of an AKI care bundle, including measures such as fluid monitoring and medication review, within 24 hours.

Taking this concept further, Dr. Scanlon and colleagues developed a random forest model, a type of machine learning algorithm, that incorporates other markers from blood tests routinely obtained for all patients, with the aim of predicting AKI up to 30 days in advance.

“Using routinely collected blood test results will ensure that the model is applicable to all our patients and can be implemented in an automated manner,” Dr. Scanlon noted.

The investigators developed and trained the model using 597,403 blood test results from 48,865 patients undergoing cancer treatment between January 2017 and May 2020.

The model assigns patients to five categories of risk for AKI in the next 30 days: very low, low, medium, high, and very high.

“We wanted the model to output in this way so that it could be used by clinicians alongside their own insight and knowledge on a case-by-case basis,” Dr. Scanlon explained.

The investigators then prospectively validated the model and its risk categories in another 9,913 patients who underwent cancer treatment between June and August 2020.

Using a model threshold of medium risk or higher, the model correctly predicted AKI in 330 (73.8%) of the 447 patients in the validation cohort who ultimately developed AKI.

“This is pretty amazing and shows that this model really is working and can correctly detect these AKIs up to 30 days before they occur, giving a huge window to put in place preventive strategies,” Dr. Scanlon said.

Among the 154 patients in whom the model incorrectly predicted AKI, 9 patients had only a single follow-up blood test and 17 patients did not have any, leaving their actual outcomes unclear.

“Given that AKI detection uses blood tests, an AKI in these patients was never confirmed,” Dr. Scanlon noted. “So this could give a potential benefit of the model that we never intended: It could reduce undiagnosed AKI by flagging those who are at risk.”

“Our next steps are to test the model through a technology clinical trial to see if putting intervention strategies in place does prevent these AKIs from taking place,” Dr. Scanlon concluded. “We are also going to move to ongoing monitoring of the model performance.”

Dr. Scanlon disclosed no conflicts of interest. The study did not receive specific funding.

Meeting/Event
Publications
Topics
Sections
Meeting/Event
Meeting/Event

A model that crunches data from routine blood tests can accurately identify cancer patients who will develop acute kidney injury (AKI) up to a month before it happens, according to a cohort study.

Scanlon_Lauren_A_UK_web.jpg
Dr. Lauren A. Scanlon

The algorithm spotted nearly 74% of the patients who went on to develop AKI within 30 days, providing a window for intervention and possibly prevention, according to investigators.

These results were reported at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (abstract PR-11).

“Cancer patients are a high-risk population for AKI due to the nature of their treatment and illness,” said presenter Lauren A. Scanlon, PhD, a data scientist at The Christie NHS Foundation Trust in Huddersfield, England. “AKI causes a huge disruption in treatment and distress for the patient, so it would be amazing if we could, say, predict the AKI before it occurs and prevent it from even happening.”

U.K. health care providers are already using an algorithm to monitor patients’ creatinine levels, comparing new values against historic ones, Dr. Scanlon explained. When that algorithm detects AKI, it issues an alert that triggers implementation of an AKI care bundle, including measures such as fluid monitoring and medication review, within 24 hours.

Taking this concept further, Dr. Scanlon and colleagues developed a random forest model, a type of machine learning algorithm, that incorporates other markers from blood tests routinely obtained for all patients, with the aim of predicting AKI up to 30 days in advance.

“Using routinely collected blood test results will ensure that the model is applicable to all our patients and can be implemented in an automated manner,” Dr. Scanlon noted.

The investigators developed and trained the model using 597,403 blood test results from 48,865 patients undergoing cancer treatment between January 2017 and May 2020.

The model assigns patients to five categories of risk for AKI in the next 30 days: very low, low, medium, high, and very high.

“We wanted the model to output in this way so that it could be used by clinicians alongside their own insight and knowledge on a case-by-case basis,” Dr. Scanlon explained.

The investigators then prospectively validated the model and its risk categories in another 9,913 patients who underwent cancer treatment between June and August 2020.

Using a model threshold of medium risk or higher, the model correctly predicted AKI in 330 (73.8%) of the 447 patients in the validation cohort who ultimately developed AKI.

“This is pretty amazing and shows that this model really is working and can correctly detect these AKIs up to 30 days before they occur, giving a huge window to put in place preventive strategies,” Dr. Scanlon said.

Among the 154 patients in whom the model incorrectly predicted AKI, 9 patients had only a single follow-up blood test and 17 patients did not have any, leaving their actual outcomes unclear.

“Given that AKI detection uses blood tests, an AKI in these patients was never confirmed,” Dr. Scanlon noted. “So this could give a potential benefit of the model that we never intended: It could reduce undiagnosed AKI by flagging those who are at risk.”

“Our next steps are to test the model through a technology clinical trial to see if putting intervention strategies in place does prevent these AKIs from taking place,” Dr. Scanlon concluded. “We are also going to move to ongoing monitoring of the model performance.”

Dr. Scanlon disclosed no conflicts of interest. The study did not receive specific funding.

A model that crunches data from routine blood tests can accurately identify cancer patients who will develop acute kidney injury (AKI) up to a month before it happens, according to a cohort study.

Scanlon_Lauren_A_UK_web.jpg
Dr. Lauren A. Scanlon

The algorithm spotted nearly 74% of the patients who went on to develop AKI within 30 days, providing a window for intervention and possibly prevention, according to investigators.

These results were reported at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (abstract PR-11).

“Cancer patients are a high-risk population for AKI due to the nature of their treatment and illness,” said presenter Lauren A. Scanlon, PhD, a data scientist at The Christie NHS Foundation Trust in Huddersfield, England. “AKI causes a huge disruption in treatment and distress for the patient, so it would be amazing if we could, say, predict the AKI before it occurs and prevent it from even happening.”

U.K. health care providers are already using an algorithm to monitor patients’ creatinine levels, comparing new values against historic ones, Dr. Scanlon explained. When that algorithm detects AKI, it issues an alert that triggers implementation of an AKI care bundle, including measures such as fluid monitoring and medication review, within 24 hours.

Taking this concept further, Dr. Scanlon and colleagues developed a random forest model, a type of machine learning algorithm, that incorporates other markers from blood tests routinely obtained for all patients, with the aim of predicting AKI up to 30 days in advance.

“Using routinely collected blood test results will ensure that the model is applicable to all our patients and can be implemented in an automated manner,” Dr. Scanlon noted.

The investigators developed and trained the model using 597,403 blood test results from 48,865 patients undergoing cancer treatment between January 2017 and May 2020.

The model assigns patients to five categories of risk for AKI in the next 30 days: very low, low, medium, high, and very high.

“We wanted the model to output in this way so that it could be used by clinicians alongside their own insight and knowledge on a case-by-case basis,” Dr. Scanlon explained.

The investigators then prospectively validated the model and its risk categories in another 9,913 patients who underwent cancer treatment between June and August 2020.

Using a model threshold of medium risk or higher, the model correctly predicted AKI in 330 (73.8%) of the 447 patients in the validation cohort who ultimately developed AKI.

“This is pretty amazing and shows that this model really is working and can correctly detect these AKIs up to 30 days before they occur, giving a huge window to put in place preventive strategies,” Dr. Scanlon said.

Among the 154 patients in whom the model incorrectly predicted AKI, 9 patients had only a single follow-up blood test and 17 patients did not have any, leaving their actual outcomes unclear.

“Given that AKI detection uses blood tests, an AKI in these patients was never confirmed,” Dr. Scanlon noted. “So this could give a potential benefit of the model that we never intended: It could reduce undiagnosed AKI by flagging those who are at risk.”

“Our next steps are to test the model through a technology clinical trial to see if putting intervention strategies in place does prevent these AKIs from taking place,” Dr. Scanlon concluded. “We are also going to move to ongoing monitoring of the model performance.”

Dr. Scanlon disclosed no conflicts of interest. The study did not receive specific funding.

Publications
Publications
Topics
Article Type
Sections
Article Source

FROM AACR: AI, DIAGNOSIS, AND IMAGING 2021

Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article

Test could help patients with pancreatic cysts avoid unneeded surgery

Article Type
Changed
Wed, 05/26/2021 - 13:41

A test that uses machine learning may improve the management of patients with pancreatic cysts, sparing some of them unnecessary surgery, a cohort study suggests.

Karchin_Rachel_MD_web.TIF
Dr. Rachel Karchin

The test, called CompCyst, integrates clinical, imaging, and biomarker data. It proved more accurate than the current standard of care for correctly determining whether patients should be discharged from follow-up, immediately operated on, or monitored.

Rachel Karchin, PhD, of the Johns Hopkins Whiting School of Engineering in Baltimore, reported these results at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-13).

“Preoperative diagnosis of pancreatic cysts and managing patients who present with a cyst are a clinical conundrum because pancreatic cancer is so deadly, while the decision to surgically resect a cyst is complicated by the danger of the surgery, which has high morbidity and mortality,” Dr. Karchin explained. “The challenge of the diagnostic test is to place patients into one of three groups: those who should be discharged, who should be operated on, and who should be monitored.”

High sensitivity is important for the operate and monitor groups to ensure identification of all patients needing these approaches, whereas higher specificity is important for the discharge group to avoid falsely classifying premalignant cysts, Dr. Karchin said.

She and her colleagues applied machine learning to this classification challenge, using data from 862 patients who had undergone resection of pancreatic cysts at 16 centers in the United States, Europe, and Asia. All patients had a known cyst histopathology, which served as the gold standard, and a known clinical management strategy (discharge, operate, or monitor).

The investigators used a multivariate organization of combinatorial alterations algorithm that integrates clinical features, imaging characteristics, cyst fluid genetics, and serum biomarkers to create classifiers. This algorithm can be trained to maximize sensitivity, maximize specificity, or balance these metrics, Dr. Karchin noted.

The resulting test, CompCyst, was trained using data from 436 of the patients and then validated in the remaining 426 patients.

In the validation cohort, for classifying patients who should be discharged from care, the test had a sensitivity of 46% and a specificity of 100%, according to results reported at the conference and published previously (Sci Transl Med. 2019 Jul 19. doi: 10.1126/scitranslmed.aav4772).

For immediately operating, CompCyst had a sensitivity of 91% and a specificity of 54%. And for monitoring the patient, the test had a sensitivity of 99% and a specificity of 30%.

When CompCyst was compared against the standard of care based on conventional clinical and imaging criteria alone, the former was more accurate. CompCyst correctly identified larger shares of patients who should have been discharged (60% vs. 19%) and who should have been monitored (49% vs. 34%), and the test identified a similar share of patients who should have immediately had an operation (91% vs. 89%).

“The takeaway from this is that standard of care is sending too many patients unnecessarily to surgery,” Dr. Karchin commented. “The CompCyst test, with application of the three classifiers sequentially – discharge, operate, or monitor – could reduce unnecessary surgery by 60% or more based on our calculations.”

“While our study was retrospective, it shows promising results in reducing unnecessary surgeries, compared to current standard of care,” she said, adding that a prospective study is planned next.

“In 10-12 weeks, this CompCyst diagnostic test is going to be available at Johns Hopkins for patients. I’m very excited about that,” Dr. Karchin concluded. “We hope that our study shows the potential of combining clinical, imaging, and genetic features with machine learning to improve clinical judgment about many diseases.”

Dr. Karchin disclosed no conflicts of interest. The study was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Virginia and D.K. Ludwig Fund for Cancer Research, the Sol Goldman Pancreatic Cancer Research Center, the Michael Rolfe Pancreatic Cancer Research Foundation, the Benjamin Baker Scholarship, and the National Institutes of Health.

Meeting/Event
Publications
Topics
Sections
Meeting/Event
Meeting/Event

A test that uses machine learning may improve the management of patients with pancreatic cysts, sparing some of them unnecessary surgery, a cohort study suggests.

Karchin_Rachel_MD_web.TIF
Dr. Rachel Karchin

The test, called CompCyst, integrates clinical, imaging, and biomarker data. It proved more accurate than the current standard of care for correctly determining whether patients should be discharged from follow-up, immediately operated on, or monitored.

Rachel Karchin, PhD, of the Johns Hopkins Whiting School of Engineering in Baltimore, reported these results at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-13).

“Preoperative diagnosis of pancreatic cysts and managing patients who present with a cyst are a clinical conundrum because pancreatic cancer is so deadly, while the decision to surgically resect a cyst is complicated by the danger of the surgery, which has high morbidity and mortality,” Dr. Karchin explained. “The challenge of the diagnostic test is to place patients into one of three groups: those who should be discharged, who should be operated on, and who should be monitored.”

High sensitivity is important for the operate and monitor groups to ensure identification of all patients needing these approaches, whereas higher specificity is important for the discharge group to avoid falsely classifying premalignant cysts, Dr. Karchin said.

She and her colleagues applied machine learning to this classification challenge, using data from 862 patients who had undergone resection of pancreatic cysts at 16 centers in the United States, Europe, and Asia. All patients had a known cyst histopathology, which served as the gold standard, and a known clinical management strategy (discharge, operate, or monitor).

The investigators used a multivariate organization of combinatorial alterations algorithm that integrates clinical features, imaging characteristics, cyst fluid genetics, and serum biomarkers to create classifiers. This algorithm can be trained to maximize sensitivity, maximize specificity, or balance these metrics, Dr. Karchin noted.

The resulting test, CompCyst, was trained using data from 436 of the patients and then validated in the remaining 426 patients.

In the validation cohort, for classifying patients who should be discharged from care, the test had a sensitivity of 46% and a specificity of 100%, according to results reported at the conference and published previously (Sci Transl Med. 2019 Jul 19. doi: 10.1126/scitranslmed.aav4772).

For immediately operating, CompCyst had a sensitivity of 91% and a specificity of 54%. And for monitoring the patient, the test had a sensitivity of 99% and a specificity of 30%.

When CompCyst was compared against the standard of care based on conventional clinical and imaging criteria alone, the former was more accurate. CompCyst correctly identified larger shares of patients who should have been discharged (60% vs. 19%) and who should have been monitored (49% vs. 34%), and the test identified a similar share of patients who should have immediately had an operation (91% vs. 89%).

“The takeaway from this is that standard of care is sending too many patients unnecessarily to surgery,” Dr. Karchin commented. “The CompCyst test, with application of the three classifiers sequentially – discharge, operate, or monitor – could reduce unnecessary surgery by 60% or more based on our calculations.”

“While our study was retrospective, it shows promising results in reducing unnecessary surgeries, compared to current standard of care,” she said, adding that a prospective study is planned next.

“In 10-12 weeks, this CompCyst diagnostic test is going to be available at Johns Hopkins for patients. I’m very excited about that,” Dr. Karchin concluded. “We hope that our study shows the potential of combining clinical, imaging, and genetic features with machine learning to improve clinical judgment about many diseases.”

Dr. Karchin disclosed no conflicts of interest. The study was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Virginia and D.K. Ludwig Fund for Cancer Research, the Sol Goldman Pancreatic Cancer Research Center, the Michael Rolfe Pancreatic Cancer Research Foundation, the Benjamin Baker Scholarship, and the National Institutes of Health.

A test that uses machine learning may improve the management of patients with pancreatic cysts, sparing some of them unnecessary surgery, a cohort study suggests.

Karchin_Rachel_MD_web.TIF
Dr. Rachel Karchin

The test, called CompCyst, integrates clinical, imaging, and biomarker data. It proved more accurate than the current standard of care for correctly determining whether patients should be discharged from follow-up, immediately operated on, or monitored.

Rachel Karchin, PhD, of the Johns Hopkins Whiting School of Engineering in Baltimore, reported these results at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (Abstract IA-13).

“Preoperative diagnosis of pancreatic cysts and managing patients who present with a cyst are a clinical conundrum because pancreatic cancer is so deadly, while the decision to surgically resect a cyst is complicated by the danger of the surgery, which has high morbidity and mortality,” Dr. Karchin explained. “The challenge of the diagnostic test is to place patients into one of three groups: those who should be discharged, who should be operated on, and who should be monitored.”

High sensitivity is important for the operate and monitor groups to ensure identification of all patients needing these approaches, whereas higher specificity is important for the discharge group to avoid falsely classifying premalignant cysts, Dr. Karchin said.

She and her colleagues applied machine learning to this classification challenge, using data from 862 patients who had undergone resection of pancreatic cysts at 16 centers in the United States, Europe, and Asia. All patients had a known cyst histopathology, which served as the gold standard, and a known clinical management strategy (discharge, operate, or monitor).

The investigators used a multivariate organization of combinatorial alterations algorithm that integrates clinical features, imaging characteristics, cyst fluid genetics, and serum biomarkers to create classifiers. This algorithm can be trained to maximize sensitivity, maximize specificity, or balance these metrics, Dr. Karchin noted.

The resulting test, CompCyst, was trained using data from 436 of the patients and then validated in the remaining 426 patients.

In the validation cohort, for classifying patients who should be discharged from care, the test had a sensitivity of 46% and a specificity of 100%, according to results reported at the conference and published previously (Sci Transl Med. 2019 Jul 19. doi: 10.1126/scitranslmed.aav4772).

For immediately operating, CompCyst had a sensitivity of 91% and a specificity of 54%. And for monitoring the patient, the test had a sensitivity of 99% and a specificity of 30%.

When CompCyst was compared against the standard of care based on conventional clinical and imaging criteria alone, the former was more accurate. CompCyst correctly identified larger shares of patients who should have been discharged (60% vs. 19%) and who should have been monitored (49% vs. 34%), and the test identified a similar share of patients who should have immediately had an operation (91% vs. 89%).

“The takeaway from this is that standard of care is sending too many patients unnecessarily to surgery,” Dr. Karchin commented. “The CompCyst test, with application of the three classifiers sequentially – discharge, operate, or monitor – could reduce unnecessary surgery by 60% or more based on our calculations.”

“While our study was retrospective, it shows promising results in reducing unnecessary surgeries, compared to current standard of care,” she said, adding that a prospective study is planned next.

“In 10-12 weeks, this CompCyst diagnostic test is going to be available at Johns Hopkins for patients. I’m very excited about that,” Dr. Karchin concluded. “We hope that our study shows the potential of combining clinical, imaging, and genetic features with machine learning to improve clinical judgment about many diseases.”

Dr. Karchin disclosed no conflicts of interest. The study was supported by the Lustgarten Foundation for Pancreatic Cancer Research, the Virginia and D.K. Ludwig Fund for Cancer Research, the Sol Goldman Pancreatic Cancer Research Center, the Michael Rolfe Pancreatic Cancer Research Foundation, the Benjamin Baker Scholarship, and the National Institutes of Health.

Publications
Publications
Topics
Article Type
Sections
Article Source

FROM AACR: AI, DIAGNOSIS, AND IMAGING 2021

Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article

AI can identify biomarkers and potentially guide therapy in NSCLC

Article Type
Changed
Thu, 01/28/2021 - 15:03

The molecular biomarkers of advanced non–small cell lung cancer (NSCLC) – and hence the best treatment option – may soon be identified in real time from scans, thanks to a new decision support tool that uses artificial intelligence (AI).

Wei_Mu_FL_web.jpg
Dr. Wei Mu

Researchers developed deep learning models that could accurately predict a patient’s PD-L1 and EGFR mutation status without the need for a biopsy. If these models are validated in prospective trials, they could guide treatment decisions in patients with NSCLC, according to the researchers.

Wei Mu, PhD, of Moffitt Cancer Center and Research Institute in Tampa, Fla., described this research at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (abstract PR-03).
 

Rationale

Guidelines from the National Comprehensive Cancer Network (NCCN) endorse tailored treatment for patients with NSCLC; namely, immune checkpoint inhibitors for those with PD-L1-positive tumors and EGFR tyrosine kinase inhibitors for patients with tumors harboring a mutation in EGFR.

However, the conventional approach to ascertaining tumor status for these biomarkers has disadvantages, Dr. Mu noted.

“Both require biopsy, which may fail due to insufficient quality of the tissue and, particularly for NSCLC, may increase the chance of morbidity,” Dr. Mu said.

In addition, there is room for improvement in the rigor of the biomarker assays, and there can be substantial wait times for results.

To address these issues, Dr. Mu and colleagues explored an AI radiomics approach using PET/CT scans.

“We know that EGFR mutation and positive PD-L1 expression may change the metabolism of the peritumor and intratumor microenvironment,” Dr. Mu explained. “Therefore, we had the hypothesis that they can be captured by the FDG-PET/CT images.”
 

Results

The investigators used FDG-PET/CT images from 837 patients with advanced NSCLC treated at four institutions. The team developed AI deep learning models that generated one score for PD-L1 positivity and another score for presence of an EGFR mutation, as well as an associated algorithm that would direct patients to the appropriate treatments depending on the scores.

Results for the PD-L1 deep learning score showed good accuracy in predicting positivity for this ligand, with an area under the curve of 0.89 in the training cohort, 0.84 in the validation cohort, and 0.82 in an external test cohort, Dr. Mu reported. All exceeded the corresponding areas under the curve for maximal standardized uptake values.

Moreover, the score was prognostic and statistically indistinguishable from PD-L1 status determined by immunohistochemistry in predicting progression-free survival.

Similarly, the EGFR deep learning score showed good accuracy in predicting mutational status, with an area under the curve of 0.86 in the training cohort, 0.83 in the validation cohort, and 0.81 in an external test cohort. It outperformed a clinical score based on sex, smoking status, tumor histology, and maximal standardized uptake value in each cohort.

The EGFR deep learning score was prognostic and statistically indistinguishable from EGFR mutational status determined by polymerase chain reaction in predicting progression-free survival.

The models showed good stability when size of the input region of interest was varied, and when different radiologists delineated the region of interest, with an intraclass correlation coefficient of 0.91.

“We developed deep learning models to predict PD-L1 status and EGFR mutation with high accuracy. Using the generated deep learning scores, we obtained a noninvasive treatment decision support tool, which may be useful as a clinical decision support tool pending validation of its clinical utility in a large prospective trial,” Dr. Mu summarized. “Using our tool, NSCLC patients could be directly offered a treatment decision without the need of biopsy.”

“In the future, we will perform a prospective observational trial to compare the results of our noninvasive treatment decision tool with molecular biomarker–based NCCN guidelines,” she said.

The investigators plan to add ALK rearrangement status and prediction of serious adverse events and cachexia to the decision support tool.

Dr. Mu disclosed no conflicts of interest. The study did not have specific funding.

Meeting/Event
Publications
Topics
Sections
Meeting/Event
Meeting/Event

The molecular biomarkers of advanced non–small cell lung cancer (NSCLC) – and hence the best treatment option – may soon be identified in real time from scans, thanks to a new decision support tool that uses artificial intelligence (AI).

Wei_Mu_FL_web.jpg
Dr. Wei Mu

Researchers developed deep learning models that could accurately predict a patient’s PD-L1 and EGFR mutation status without the need for a biopsy. If these models are validated in prospective trials, they could guide treatment decisions in patients with NSCLC, according to the researchers.

Wei Mu, PhD, of Moffitt Cancer Center and Research Institute in Tampa, Fla., described this research at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (abstract PR-03).
 

Rationale

Guidelines from the National Comprehensive Cancer Network (NCCN) endorse tailored treatment for patients with NSCLC; namely, immune checkpoint inhibitors for those with PD-L1-positive tumors and EGFR tyrosine kinase inhibitors for patients with tumors harboring a mutation in EGFR.

However, the conventional approach to ascertaining tumor status for these biomarkers has disadvantages, Dr. Mu noted.

“Both require biopsy, which may fail due to insufficient quality of the tissue and, particularly for NSCLC, may increase the chance of morbidity,” Dr. Mu said.

In addition, there is room for improvement in the rigor of the biomarker assays, and there can be substantial wait times for results.

To address these issues, Dr. Mu and colleagues explored an AI radiomics approach using PET/CT scans.

“We know that EGFR mutation and positive PD-L1 expression may change the metabolism of the peritumor and intratumor microenvironment,” Dr. Mu explained. “Therefore, we had the hypothesis that they can be captured by the FDG-PET/CT images.”
 

Results

The investigators used FDG-PET/CT images from 837 patients with advanced NSCLC treated at four institutions. The team developed AI deep learning models that generated one score for PD-L1 positivity and another score for presence of an EGFR mutation, as well as an associated algorithm that would direct patients to the appropriate treatments depending on the scores.

Results for the PD-L1 deep learning score showed good accuracy in predicting positivity for this ligand, with an area under the curve of 0.89 in the training cohort, 0.84 in the validation cohort, and 0.82 in an external test cohort, Dr. Mu reported. All exceeded the corresponding areas under the curve for maximal standardized uptake values.

Moreover, the score was prognostic and statistically indistinguishable from PD-L1 status determined by immunohistochemistry in predicting progression-free survival.

Similarly, the EGFR deep learning score showed good accuracy in predicting mutational status, with an area under the curve of 0.86 in the training cohort, 0.83 in the validation cohort, and 0.81 in an external test cohort. It outperformed a clinical score based on sex, smoking status, tumor histology, and maximal standardized uptake value in each cohort.

The EGFR deep learning score was prognostic and statistically indistinguishable from EGFR mutational status determined by polymerase chain reaction in predicting progression-free survival.

The models showed good stability when size of the input region of interest was varied, and when different radiologists delineated the region of interest, with an intraclass correlation coefficient of 0.91.

“We developed deep learning models to predict PD-L1 status and EGFR mutation with high accuracy. Using the generated deep learning scores, we obtained a noninvasive treatment decision support tool, which may be useful as a clinical decision support tool pending validation of its clinical utility in a large prospective trial,” Dr. Mu summarized. “Using our tool, NSCLC patients could be directly offered a treatment decision without the need of biopsy.”

“In the future, we will perform a prospective observational trial to compare the results of our noninvasive treatment decision tool with molecular biomarker–based NCCN guidelines,” she said.

The investigators plan to add ALK rearrangement status and prediction of serious adverse events and cachexia to the decision support tool.

Dr. Mu disclosed no conflicts of interest. The study did not have specific funding.

The molecular biomarkers of advanced non–small cell lung cancer (NSCLC) – and hence the best treatment option – may soon be identified in real time from scans, thanks to a new decision support tool that uses artificial intelligence (AI).

Wei_Mu_FL_web.jpg
Dr. Wei Mu

Researchers developed deep learning models that could accurately predict a patient’s PD-L1 and EGFR mutation status without the need for a biopsy. If these models are validated in prospective trials, they could guide treatment decisions in patients with NSCLC, according to the researchers.

Wei Mu, PhD, of Moffitt Cancer Center and Research Institute in Tampa, Fla., described this research at the AACR Virtual Special Conference: Artificial Intelligence, Diagnosis, and Imaging (abstract PR-03).
 

Rationale

Guidelines from the National Comprehensive Cancer Network (NCCN) endorse tailored treatment for patients with NSCLC; namely, immune checkpoint inhibitors for those with PD-L1-positive tumors and EGFR tyrosine kinase inhibitors for patients with tumors harboring a mutation in EGFR.

However, the conventional approach to ascertaining tumor status for these biomarkers has disadvantages, Dr. Mu noted.

“Both require biopsy, which may fail due to insufficient quality of the tissue and, particularly for NSCLC, may increase the chance of morbidity,” Dr. Mu said.

In addition, there is room for improvement in the rigor of the biomarker assays, and there can be substantial wait times for results.

To address these issues, Dr. Mu and colleagues explored an AI radiomics approach using PET/CT scans.

“We know that EGFR mutation and positive PD-L1 expression may change the metabolism of the peritumor and intratumor microenvironment,” Dr. Mu explained. “Therefore, we had the hypothesis that they can be captured by the FDG-PET/CT images.”
 

Results

The investigators used FDG-PET/CT images from 837 patients with advanced NSCLC treated at four institutions. The team developed AI deep learning models that generated one score for PD-L1 positivity and another score for presence of an EGFR mutation, as well as an associated algorithm that would direct patients to the appropriate treatments depending on the scores.

Results for the PD-L1 deep learning score showed good accuracy in predicting positivity for this ligand, with an area under the curve of 0.89 in the training cohort, 0.84 in the validation cohort, and 0.82 in an external test cohort, Dr. Mu reported. All exceeded the corresponding areas under the curve for maximal standardized uptake values.

Moreover, the score was prognostic and statistically indistinguishable from PD-L1 status determined by immunohistochemistry in predicting progression-free survival.

Similarly, the EGFR deep learning score showed good accuracy in predicting mutational status, with an area under the curve of 0.86 in the training cohort, 0.83 in the validation cohort, and 0.81 in an external test cohort. It outperformed a clinical score based on sex, smoking status, tumor histology, and maximal standardized uptake value in each cohort.

The EGFR deep learning score was prognostic and statistically indistinguishable from EGFR mutational status determined by polymerase chain reaction in predicting progression-free survival.

The models showed good stability when size of the input region of interest was varied, and when different radiologists delineated the region of interest, with an intraclass correlation coefficient of 0.91.

“We developed deep learning models to predict PD-L1 status and EGFR mutation with high accuracy. Using the generated deep learning scores, we obtained a noninvasive treatment decision support tool, which may be useful as a clinical decision support tool pending validation of its clinical utility in a large prospective trial,” Dr. Mu summarized. “Using our tool, NSCLC patients could be directly offered a treatment decision without the need of biopsy.”

“In the future, we will perform a prospective observational trial to compare the results of our noninvasive treatment decision tool with molecular biomarker–based NCCN guidelines,” she said.

The investigators plan to add ALK rearrangement status and prediction of serious adverse events and cachexia to the decision support tool.

Dr. Mu disclosed no conflicts of interest. The study did not have specific funding.

Publications
Publications
Topics
Article Type
Sections
Article Source

FROM AACR: AI, DIAGNOSIS, AND IMAGING 2021

Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article
Medscape Article