User login
Identifying Progression-free Survival in Veterans with Diffuse Large B-Cell Lymphoma Using Electronic Health Care Records
Purpose: To establish a gold-standard methodology for accurately extracting progression-free survival (PFS) following Diffuse Large B-Cell Lymphoma (DLBCL) treatment using real-world electronic healthcare record (EHR) data.
Background: Randomized controlled trials using response evaluation criteria have long served as the gold standard for assessing response to therapy and PFS. However, characteristics of participants in clinical trials do not reflect the overall patient population, and formal response evaluation criteria are not used in realworld contexts. Furthermore, real-world data are often unstructured, preventing accurate comparison of PFS using structured clinical trial data versus real-world data, and existing approaches define PFS inconsistently. Despite the importance of assessing PFS in patients outside of controlled clinical trials, no goldstandard method for collecting and validating PFS from real-world evidence has been established.
Methods: Clinicians, programmers, and data scientists collaborated to develop an R Shiny10 application using Veterans Affairs Corporate Data Warehouse data from the EHR of 352 DLBCL patients. The application takes unstructured data such as clinical notes and facilitates the capture, annotation, and tagging of key words or phrases indicative of progression, thus allowing accurate determination of the date of first identification of progression by a treating clinician.
Data Analysis: In order to refine data-collection techniques and evaluate whether the application can enable calculation of real-world PFS, we conducted an adaptive and iterative process of reviewing EHR documents and capturing and annotating data until a consistent schema and methodology was established. In order to validate annotation schema and methodology, annotations of 50 patient records were performed by 2 annotators and assessed for concordance.
Results: We produced an R Shiny application that can capture, annotate, and transform unstructured EHR data into structured data—specifically, treatment lines, cycles, and response criteria with corresponding dates—ready for analysis of PFS. An annotation schema for capturing real-world data was also developed. Mapping of common phrases used by clinicians in real-world practice to response criteria resulted in a dictionary of these phrases.
Implications: These efforts show that it is possible to convert EHR context reliably into analyzable data such as PFS. Further attempts will be made to establish a gold-standard methodology.
Purpose: To establish a gold-standard methodology for accurately extracting progression-free survival (PFS) following Diffuse Large B-Cell Lymphoma (DLBCL) treatment using real-world electronic healthcare record (EHR) data.
Background: Randomized controlled trials using response evaluation criteria have long served as the gold standard for assessing response to therapy and PFS. However, characteristics of participants in clinical trials do not reflect the overall patient population, and formal response evaluation criteria are not used in realworld contexts. Furthermore, real-world data are often unstructured, preventing accurate comparison of PFS using structured clinical trial data versus real-world data, and existing approaches define PFS inconsistently. Despite the importance of assessing PFS in patients outside of controlled clinical trials, no goldstandard method for collecting and validating PFS from real-world evidence has been established.
Methods: Clinicians, programmers, and data scientists collaborated to develop an R Shiny10 application using Veterans Affairs Corporate Data Warehouse data from the EHR of 352 DLBCL patients. The application takes unstructured data such as clinical notes and facilitates the capture, annotation, and tagging of key words or phrases indicative of progression, thus allowing accurate determination of the date of first identification of progression by a treating clinician.
Data Analysis: In order to refine data-collection techniques and evaluate whether the application can enable calculation of real-world PFS, we conducted an adaptive and iterative process of reviewing EHR documents and capturing and annotating data until a consistent schema and methodology was established. In order to validate annotation schema and methodology, annotations of 50 patient records were performed by 2 annotators and assessed for concordance.
Results: We produced an R Shiny application that can capture, annotate, and transform unstructured EHR data into structured data—specifically, treatment lines, cycles, and response criteria with corresponding dates—ready for analysis of PFS. An annotation schema for capturing real-world data was also developed. Mapping of common phrases used by clinicians in real-world practice to response criteria resulted in a dictionary of these phrases.
Implications: These efforts show that it is possible to convert EHR context reliably into analyzable data such as PFS. Further attempts will be made to establish a gold-standard methodology.
Purpose: To establish a gold-standard methodology for accurately extracting progression-free survival (PFS) following Diffuse Large B-Cell Lymphoma (DLBCL) treatment using real-world electronic healthcare record (EHR) data.
Background: Randomized controlled trials using response evaluation criteria have long served as the gold standard for assessing response to therapy and PFS. However, characteristics of participants in clinical trials do not reflect the overall patient population, and formal response evaluation criteria are not used in realworld contexts. Furthermore, real-world data are often unstructured, preventing accurate comparison of PFS using structured clinical trial data versus real-world data, and existing approaches define PFS inconsistently. Despite the importance of assessing PFS in patients outside of controlled clinical trials, no goldstandard method for collecting and validating PFS from real-world evidence has been established.
Methods: Clinicians, programmers, and data scientists collaborated to develop an R Shiny10 application using Veterans Affairs Corporate Data Warehouse data from the EHR of 352 DLBCL patients. The application takes unstructured data such as clinical notes and facilitates the capture, annotation, and tagging of key words or phrases indicative of progression, thus allowing accurate determination of the date of first identification of progression by a treating clinician.
Data Analysis: In order to refine data-collection techniques and evaluate whether the application can enable calculation of real-world PFS, we conducted an adaptive and iterative process of reviewing EHR documents and capturing and annotating data until a consistent schema and methodology was established. In order to validate annotation schema and methodology, annotations of 50 patient records were performed by 2 annotators and assessed for concordance.
Results: We produced an R Shiny application that can capture, annotate, and transform unstructured EHR data into structured data—specifically, treatment lines, cycles, and response criteria with corresponding dates—ready for analysis of PFS. An annotation schema for capturing real-world data was also developed. Mapping of common phrases used by clinicians in real-world practice to response criteria resulted in a dictionary of these phrases.
Implications: These efforts show that it is possible to convert EHR context reliably into analyzable data such as PFS. Further attempts will be made to establish a gold-standard methodology.
Survival Rates of Black and White Veterans With Metastatic Castration-Resistant Prostate Cancer
Rationale: Understand the survival outcomes of black and white Veterans with metastatic castration-resistant prostate cancer (mCRPC).
Background: Black men have a higher incidence of prostate cancer and are more likely to be diagnosed at an earlier age, have a more aggressive disease at presentation and experience worse clinical outcome than white men. Evidence has shown that at diagnosis race is associated with disease progression, however, few studies have examined whether race is associated with survival in men once they reach an advanced stage of the disease. This study examines the survival outcomes among black and white men with mCRPC treated in the Veterans Health Administration (VHA).
Methods: Patient information from the Veterans Affairs (VA) Central Cancer Registry and the VA Corporate Data Warehouse was used to identify patients who were diagnosed with prostate cancer and later developed mCRPC, defined as: (1) Radiologic evidence of metastasis obtained from radiology reports using a natural language processing algorithm; (2) Evidence of rising prostate-specific antigen (PSA); and (3) Evidence of ongoing androgen deprivation consisting of a serum testosterone level of 50 ng/dL. Patient demographics, disease characteristics and treatment practices, and survival outcomes were extracted.
Results: From 2006 to 2015, 120,374 patients were diagnosed and treated with prostate cancer in the VHA; with 3,637 developing mCRPC. At diagnosis 2,429 (67%) were white, 1,066 (29%) were black, and 142 (4%) were reported as other. Compared to white men, black men were younger (66 vs 69 years) and had a higher PSA (92 vs 41 ng/mL), although there were no differences in disease characteristics
(Gleason score and stage) and early treatments (radiation prostatectomy, surgical orchiectomy, castration by agonists, castration by agonists/androgen deprivation). There were no significant differences between black and white men with mCRPC and their overall survival on both univariable (HR, .95; P = .203) or multivariable (HR, 1.0; P = .971) analyses.
Conclusions: Consistent with prior reports, black men are more likely to develop mCRPC than white men, although once black men progress to advanced disease, a multivariable analysis suggests that race is not associated with overall survival.
Rationale: Understand the survival outcomes of black and white Veterans with metastatic castration-resistant prostate cancer (mCRPC).
Background: Black men have a higher incidence of prostate cancer and are more likely to be diagnosed at an earlier age, have a more aggressive disease at presentation and experience worse clinical outcome than white men. Evidence has shown that at diagnosis race is associated with disease progression, however, few studies have examined whether race is associated with survival in men once they reach an advanced stage of the disease. This study examines the survival outcomes among black and white men with mCRPC treated in the Veterans Health Administration (VHA).
Methods: Patient information from the Veterans Affairs (VA) Central Cancer Registry and the VA Corporate Data Warehouse was used to identify patients who were diagnosed with prostate cancer and later developed mCRPC, defined as: (1) Radiologic evidence of metastasis obtained from radiology reports using a natural language processing algorithm; (2) Evidence of rising prostate-specific antigen (PSA); and (3) Evidence of ongoing androgen deprivation consisting of a serum testosterone level of 50 ng/dL. Patient demographics, disease characteristics and treatment practices, and survival outcomes were extracted.
Results: From 2006 to 2015, 120,374 patients were diagnosed and treated with prostate cancer in the VHA; with 3,637 developing mCRPC. At diagnosis 2,429 (67%) were white, 1,066 (29%) were black, and 142 (4%) were reported as other. Compared to white men, black men were younger (66 vs 69 years) and had a higher PSA (92 vs 41 ng/mL), although there were no differences in disease characteristics
(Gleason score and stage) and early treatments (radiation prostatectomy, surgical orchiectomy, castration by agonists, castration by agonists/androgen deprivation). There were no significant differences between black and white men with mCRPC and their overall survival on both univariable (HR, .95; P = .203) or multivariable (HR, 1.0; P = .971) analyses.
Conclusions: Consistent with prior reports, black men are more likely to develop mCRPC than white men, although once black men progress to advanced disease, a multivariable analysis suggests that race is not associated with overall survival.
Rationale: Understand the survival outcomes of black and white Veterans with metastatic castration-resistant prostate cancer (mCRPC).
Background: Black men have a higher incidence of prostate cancer and are more likely to be diagnosed at an earlier age, have a more aggressive disease at presentation and experience worse clinical outcome than white men. Evidence has shown that at diagnosis race is associated with disease progression, however, few studies have examined whether race is associated with survival in men once they reach an advanced stage of the disease. This study examines the survival outcomes among black and white men with mCRPC treated in the Veterans Health Administration (VHA).
Methods: Patient information from the Veterans Affairs (VA) Central Cancer Registry and the VA Corporate Data Warehouse was used to identify patients who were diagnosed with prostate cancer and later developed mCRPC, defined as: (1) Radiologic evidence of metastasis obtained from radiology reports using a natural language processing algorithm; (2) Evidence of rising prostate-specific antigen (PSA); and (3) Evidence of ongoing androgen deprivation consisting of a serum testosterone level of 50 ng/dL. Patient demographics, disease characteristics and treatment practices, and survival outcomes were extracted.
Results: From 2006 to 2015, 120,374 patients were diagnosed and treated with prostate cancer in the VHA; with 3,637 developing mCRPC. At diagnosis 2,429 (67%) were white, 1,066 (29%) were black, and 142 (4%) were reported as other. Compared to white men, black men were younger (66 vs 69 years) and had a higher PSA (92 vs 41 ng/mL), although there were no differences in disease characteristics
(Gleason score and stage) and early treatments (radiation prostatectomy, surgical orchiectomy, castration by agonists, castration by agonists/androgen deprivation). There were no significant differences between black and white men with mCRPC and their overall survival on both univariable (HR, .95; P = .203) or multivariable (HR, 1.0; P = .971) analyses.
Conclusions: Consistent with prior reports, black men are more likely to develop mCRPC than white men, although once black men progress to advanced disease, a multivariable analysis suggests that race is not associated with overall survival.
The Role of Academic Affiliation in the Treatment of Metastatic Castrate-Resistant Prostate Cancer in the Veterans Health Administration
Background: Cancer care in academically affiliated settings such as teaching hospitals has been associated with improved clinical outcomes. Historically, Veterans Affairs (VA) medical centers are partnered with academic affiliates; however, there have been few studies examining how this partnership affects clinical care in the Veterans Health Administration (VHA). We therefore examined the variation of first line therapy (1L) in patients with metastatic castrate resistant prostate cancer (mCRPC) in the VHA by degree of academic affiliation.
Methods: Information from the VA Central Cancer Registry was linked to clinical data from the VA Corporate Data Warehouse to identify incident cases of mCRPC, defined as first incidence of radiologic evidence of metastasis and castrate resistance in patients with prostate cancer. Patient demographics, disease characteristics and treatment practices were extracted. The degree of academic affiliation of the treating facility was calculated using the Herfindahl-Hirschman Index (HHI), which reflects how dispersed medical residents are among different specialties and how many specialties are available within a given VA facility.
Results: From 2006 to 2015, 3,637 patients received an mCRPC diagnosis and were treated in 123 VA facilities. Median HHI for treating facilities was 0.374. Of these patients, 1,723 (47%) were treated in a facility with higher academic affiliation (HAA; HHI ≥ 0.374) and 1,914 (53%) were treated in a facility with lower academic affiliation (LAA; HHI ≤ 0.373). There was no difference in patient or disease characteristics by academic affiliation; patients with HAA and LAA had comparable Gleason scores, stage of disease at diagnosis, primary local therapy, age and median PSA levels at time of diagnosis. Patients with mCRPC at HAA facilities were more likely to receive 1L (59% vs 55%, P = .015). Regimens frequently used for 1L were comparable: HAA, docetaxel (29%), abiraterone (22%), and enzalutamide (6%); LAA: docetaxel (25%), abiraterone (21%), and enzalutamide (7%).
Conclusions: Patients with mCRPC had a small but significant increase in likelihood of receiving 1L if treated in HAA vs LAA facilities. Further study will focus on identifying patient, prescriber and facility factors that are associated with the likelihood of initiating 1L and the choice of 1L regimen.
Background: Cancer care in academically affiliated settings such as teaching hospitals has been associated with improved clinical outcomes. Historically, Veterans Affairs (VA) medical centers are partnered with academic affiliates; however, there have been few studies examining how this partnership affects clinical care in the Veterans Health Administration (VHA). We therefore examined the variation of first line therapy (1L) in patients with metastatic castrate resistant prostate cancer (mCRPC) in the VHA by degree of academic affiliation.
Methods: Information from the VA Central Cancer Registry was linked to clinical data from the VA Corporate Data Warehouse to identify incident cases of mCRPC, defined as first incidence of radiologic evidence of metastasis and castrate resistance in patients with prostate cancer. Patient demographics, disease characteristics and treatment practices were extracted. The degree of academic affiliation of the treating facility was calculated using the Herfindahl-Hirschman Index (HHI), which reflects how dispersed medical residents are among different specialties and how many specialties are available within a given VA facility.
Results: From 2006 to 2015, 3,637 patients received an mCRPC diagnosis and were treated in 123 VA facilities. Median HHI for treating facilities was 0.374. Of these patients, 1,723 (47%) were treated in a facility with higher academic affiliation (HAA; HHI ≥ 0.374) and 1,914 (53%) were treated in a facility with lower academic affiliation (LAA; HHI ≤ 0.373). There was no difference in patient or disease characteristics by academic affiliation; patients with HAA and LAA had comparable Gleason scores, stage of disease at diagnosis, primary local therapy, age and median PSA levels at time of diagnosis. Patients with mCRPC at HAA facilities were more likely to receive 1L (59% vs 55%, P = .015). Regimens frequently used for 1L were comparable: HAA, docetaxel (29%), abiraterone (22%), and enzalutamide (6%); LAA: docetaxel (25%), abiraterone (21%), and enzalutamide (7%).
Conclusions: Patients with mCRPC had a small but significant increase in likelihood of receiving 1L if treated in HAA vs LAA facilities. Further study will focus on identifying patient, prescriber and facility factors that are associated with the likelihood of initiating 1L and the choice of 1L regimen.
Background: Cancer care in academically affiliated settings such as teaching hospitals has been associated with improved clinical outcomes. Historically, Veterans Affairs (VA) medical centers are partnered with academic affiliates; however, there have been few studies examining how this partnership affects clinical care in the Veterans Health Administration (VHA). We therefore examined the variation of first line therapy (1L) in patients with metastatic castrate resistant prostate cancer (mCRPC) in the VHA by degree of academic affiliation.
Methods: Information from the VA Central Cancer Registry was linked to clinical data from the VA Corporate Data Warehouse to identify incident cases of mCRPC, defined as first incidence of radiologic evidence of metastasis and castrate resistance in patients with prostate cancer. Patient demographics, disease characteristics and treatment practices were extracted. The degree of academic affiliation of the treating facility was calculated using the Herfindahl-Hirschman Index (HHI), which reflects how dispersed medical residents are among different specialties and how many specialties are available within a given VA facility.
Results: From 2006 to 2015, 3,637 patients received an mCRPC diagnosis and were treated in 123 VA facilities. Median HHI for treating facilities was 0.374. Of these patients, 1,723 (47%) were treated in a facility with higher academic affiliation (HAA; HHI ≥ 0.374) and 1,914 (53%) were treated in a facility with lower academic affiliation (LAA; HHI ≤ 0.373). There was no difference in patient or disease characteristics by academic affiliation; patients with HAA and LAA had comparable Gleason scores, stage of disease at diagnosis, primary local therapy, age and median PSA levels at time of diagnosis. Patients with mCRPC at HAA facilities were more likely to receive 1L (59% vs 55%, P = .015). Regimens frequently used for 1L were comparable: HAA, docetaxel (29%), abiraterone (22%), and enzalutamide (6%); LAA: docetaxel (25%), abiraterone (21%), and enzalutamide (7%).
Conclusions: Patients with mCRPC had a small but significant increase in likelihood of receiving 1L if treated in HAA vs LAA facilities. Further study will focus on identifying patient, prescriber and facility factors that are associated with the likelihood of initiating 1L and the choice of 1L regimen.
Examining Methods for Systematically Identifying Cytogenetic Testing Among Chronic Lymphoblastic Leukemia Patients
Purpose: To evaluate data extraction methods for identifying cytogenetic and fluorescence in situ hybridization (FISH) testing among chronic lymphoblastic leukemia (CLL) patients in the Veterans Health Administration (VHA).
Background: Cytogenetic/FISH testing are increasingly important for assessing risk and guiding therapy in patients with CLL. Administrative health data are frequently used to study testing practices; however, they are limited in their sensitivity and reliability. Increasing adoption of electronic health records (EHR) presents an opportunity to describe clinical practices in large patient populations. We compare three different EHR extraction methods to identify cytogenetic/ FISH testing in a cohort of CLL patients treated within the VHA.
Methods: CLL patients were identified using the VA Clinical Cancer Registry. Testing information was extracted from time of diagnosis to time of first treatment using three methods: (1) Current Procedural Terminology (CPT) codes; (2) Text mining of healthcare provider orders (HPO); (3) Clinical Lab Information Retrieval (CLIR), a previously validated conceptual framework that incorporates LOINC codes and test names that are then validated using test result information.
Results: 1,363 CLL patients were diagnosed and followed until their first line of therapy at VHA between 2008 and 2016: 635 (47%) had evidence of testing by text mining of HPO, 554 (41%) by CPT, and 399 (29%) by CLIR. Comparing CPT vs combined CLIR+HPO, CPT extraction had
a sensitivity of 52.8%, a precision of 73.1% and an F-measure of 0.613. Cytogenetic/FISH testing increased by nearly two-fold from 2008 to 2016, regardless of extraction method: HPO text mining (25% to 51%), CPT (20% to 54%), or CLIR (19% to 32%).
Conclusions: Advanced EHR extraction methods offer a more granular description of testing practices than administrative data alone as they examine multiple components of the EHR including the ordering, processing, and results of testing occurrences. Results suggest that there has been a slow increase in the number of CLL patients undergoing cytogenetic/FISH testing during the past decade, which is comparable to similar reports of testing practices outside the VHA, although approximately half of all CLL patients are not undergoing testing despite established clinical guideline recommendations.
Purpose: To evaluate data extraction methods for identifying cytogenetic and fluorescence in situ hybridization (FISH) testing among chronic lymphoblastic leukemia (CLL) patients in the Veterans Health Administration (VHA).
Background: Cytogenetic/FISH testing are increasingly important for assessing risk and guiding therapy in patients with CLL. Administrative health data are frequently used to study testing practices; however, they are limited in their sensitivity and reliability. Increasing adoption of electronic health records (EHR) presents an opportunity to describe clinical practices in large patient populations. We compare three different EHR extraction methods to identify cytogenetic/ FISH testing in a cohort of CLL patients treated within the VHA.
Methods: CLL patients were identified using the VA Clinical Cancer Registry. Testing information was extracted from time of diagnosis to time of first treatment using three methods: (1) Current Procedural Terminology (CPT) codes; (2) Text mining of healthcare provider orders (HPO); (3) Clinical Lab Information Retrieval (CLIR), a previously validated conceptual framework that incorporates LOINC codes and test names that are then validated using test result information.
Results: 1,363 CLL patients were diagnosed and followed until their first line of therapy at VHA between 2008 and 2016: 635 (47%) had evidence of testing by text mining of HPO, 554 (41%) by CPT, and 399 (29%) by CLIR. Comparing CPT vs combined CLIR+HPO, CPT extraction had
a sensitivity of 52.8%, a precision of 73.1% and an F-measure of 0.613. Cytogenetic/FISH testing increased by nearly two-fold from 2008 to 2016, regardless of extraction method: HPO text mining (25% to 51%), CPT (20% to 54%), or CLIR (19% to 32%).
Conclusions: Advanced EHR extraction methods offer a more granular description of testing practices than administrative data alone as they examine multiple components of the EHR including the ordering, processing, and results of testing occurrences. Results suggest that there has been a slow increase in the number of CLL patients undergoing cytogenetic/FISH testing during the past decade, which is comparable to similar reports of testing practices outside the VHA, although approximately half of all CLL patients are not undergoing testing despite established clinical guideline recommendations.
Purpose: To evaluate data extraction methods for identifying cytogenetic and fluorescence in situ hybridization (FISH) testing among chronic lymphoblastic leukemia (CLL) patients in the Veterans Health Administration (VHA).
Background: Cytogenetic/FISH testing are increasingly important for assessing risk and guiding therapy in patients with CLL. Administrative health data are frequently used to study testing practices; however, they are limited in their sensitivity and reliability. Increasing adoption of electronic health records (EHR) presents an opportunity to describe clinical practices in large patient populations. We compare three different EHR extraction methods to identify cytogenetic/ FISH testing in a cohort of CLL patients treated within the VHA.
Methods: CLL patients were identified using the VA Clinical Cancer Registry. Testing information was extracted from time of diagnosis to time of first treatment using three methods: (1) Current Procedural Terminology (CPT) codes; (2) Text mining of healthcare provider orders (HPO); (3) Clinical Lab Information Retrieval (CLIR), a previously validated conceptual framework that incorporates LOINC codes and test names that are then validated using test result information.
Results: 1,363 CLL patients were diagnosed and followed until their first line of therapy at VHA between 2008 and 2016: 635 (47%) had evidence of testing by text mining of HPO, 554 (41%) by CPT, and 399 (29%) by CLIR. Comparing CPT vs combined CLIR+HPO, CPT extraction had
a sensitivity of 52.8%, a precision of 73.1% and an F-measure of 0.613. Cytogenetic/FISH testing increased by nearly two-fold from 2008 to 2016, regardless of extraction method: HPO text mining (25% to 51%), CPT (20% to 54%), or CLIR (19% to 32%).
Conclusions: Advanced EHR extraction methods offer a more granular description of testing practices than administrative data alone as they examine multiple components of the EHR including the ordering, processing, and results of testing occurrences. Results suggest that there has been a slow increase in the number of CLL patients undergoing cytogenetic/FISH testing during the past decade, which is comparable to similar reports of testing practices outside the VHA, although approximately half of all CLL patients are not undergoing testing despite established clinical guideline recommendations.
Using Natural Language Processing in Radiology Reports to Identify the Presence of Metastatic Disease in Veterans With Prostate Cancer
Background: Radiographic imaging is important for the diagnosis and management of cancer. Radiology reports contain a wealth of information, but are typically formatted as unstructured text, making large scale information extraction challenging. We validated a natural language processing (NLP) algorithm to identify the presence of metastatic disease in radiographic imaging reports.
Methods: Using VA Clinical Cancer Registry and Corporate Data Warehouse, we identified approximately 3 million radiology reports for 120,374 patients receiving care for prostate cancer in the VA from 2006-2015. We focused on the impression section of CT, PET/CT, X-ray, bone scan, and MRI reports. We expanded on Chapman et al. “ConText” algorithm to identify the presence of metastatic disease: (1) Using UMLS, we identified terms compatible with “metastasis”; (2) Report impressions were preprocessed and tokenized at the sentence level and as part of the sentence; (3) Positive and negative trigger phrases were implemented as a series of regular expressions, which were refined over a number of iterations using training data from 2 batches of 600 reports, allowing us to extend trigger identification to a larger set of phrases. The final algorithm was validated using an independent sample of 2,000 reports annotated by a domain expert.
Results: The first training set of 600 of radiology reports achieved an accuracy of: 94% for reports with no mention of metastasis, 85% for negated mention of metastasis, and 74% mentions of metastasis without negation. Errors were reviewed resulting in vocabulary expansion and improved implementation of regular expressions to capture the expanded trigger phrases. Performance of the modified algorithm was tested on a new set of 600 reports and resulted in an increased accuracy of 96% for no mention of metastasis, 90% for negated mention of metastasis, and 89% mentions of metastasis without negation. After additional modifications were made, the revised algorithm was validated using an independent sample of 2,000 reports. The accuracy was 96% (Cohen’s kappa ~1), with precision of 98%, and a sensitivity of 98%.
Conclusions: Detecting presence of metastatic disease from radiographic notes is feasible with NLP.
References: (1) Sarkar S, Das S. A review of imaging methods for prostate cancer detection. Biomed Eng Comput Biol. 2016;7(Suppl 1):1-15. (2) Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301- 310. (3) Harkema H, Dowling JN, Thornblade T. Con-Text: An algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42(5):839-851.
Background: Radiographic imaging is important for the diagnosis and management of cancer. Radiology reports contain a wealth of information, but are typically formatted as unstructured text, making large scale information extraction challenging. We validated a natural language processing (NLP) algorithm to identify the presence of metastatic disease in radiographic imaging reports.
Methods: Using VA Clinical Cancer Registry and Corporate Data Warehouse, we identified approximately 3 million radiology reports for 120,374 patients receiving care for prostate cancer in the VA from 2006-2015. We focused on the impression section of CT, PET/CT, X-ray, bone scan, and MRI reports. We expanded on Chapman et al. “ConText” algorithm to identify the presence of metastatic disease: (1) Using UMLS, we identified terms compatible with “metastasis”; (2) Report impressions were preprocessed and tokenized at the sentence level and as part of the sentence; (3) Positive and negative trigger phrases were implemented as a series of regular expressions, which were refined over a number of iterations using training data from 2 batches of 600 reports, allowing us to extend trigger identification to a larger set of phrases. The final algorithm was validated using an independent sample of 2,000 reports annotated by a domain expert.
Results: The first training set of 600 of radiology reports achieved an accuracy of: 94% for reports with no mention of metastasis, 85% for negated mention of metastasis, and 74% mentions of metastasis without negation. Errors were reviewed resulting in vocabulary expansion and improved implementation of regular expressions to capture the expanded trigger phrases. Performance of the modified algorithm was tested on a new set of 600 reports and resulted in an increased accuracy of 96% for no mention of metastasis, 90% for negated mention of metastasis, and 89% mentions of metastasis without negation. After additional modifications were made, the revised algorithm was validated using an independent sample of 2,000 reports. The accuracy was 96% (Cohen’s kappa ~1), with precision of 98%, and a sensitivity of 98%.
Conclusions: Detecting presence of metastatic disease from radiographic notes is feasible with NLP.
References: (1) Sarkar S, Das S. A review of imaging methods for prostate cancer detection. Biomed Eng Comput Biol. 2016;7(Suppl 1):1-15. (2) Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301- 310. (3) Harkema H, Dowling JN, Thornblade T. Con-Text: An algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42(5):839-851.
Background: Radiographic imaging is important for the diagnosis and management of cancer. Radiology reports contain a wealth of information, but are typically formatted as unstructured text, making large scale information extraction challenging. We validated a natural language processing (NLP) algorithm to identify the presence of metastatic disease in radiographic imaging reports.
Methods: Using VA Clinical Cancer Registry and Corporate Data Warehouse, we identified approximately 3 million radiology reports for 120,374 patients receiving care for prostate cancer in the VA from 2006-2015. We focused on the impression section of CT, PET/CT, X-ray, bone scan, and MRI reports. We expanded on Chapman et al. “ConText” algorithm to identify the presence of metastatic disease: (1) Using UMLS, we identified terms compatible with “metastasis”; (2) Report impressions were preprocessed and tokenized at the sentence level and as part of the sentence; (3) Positive and negative trigger phrases were implemented as a series of regular expressions, which were refined over a number of iterations using training data from 2 batches of 600 reports, allowing us to extend trigger identification to a larger set of phrases. The final algorithm was validated using an independent sample of 2,000 reports annotated by a domain expert.
Results: The first training set of 600 of radiology reports achieved an accuracy of: 94% for reports with no mention of metastasis, 85% for negated mention of metastasis, and 74% mentions of metastasis without negation. Errors were reviewed resulting in vocabulary expansion and improved implementation of regular expressions to capture the expanded trigger phrases. Performance of the modified algorithm was tested on a new set of 600 reports and resulted in an increased accuracy of 96% for no mention of metastasis, 90% for negated mention of metastasis, and 89% mentions of metastasis without negation. After additional modifications were made, the revised algorithm was validated using an independent sample of 2,000 reports. The accuracy was 96% (Cohen’s kappa ~1), with precision of 98%, and a sensitivity of 98%.
Conclusions: Detecting presence of metastatic disease from radiographic notes is feasible with NLP.
References: (1) Sarkar S, Das S. A review of imaging methods for prostate cancer detection. Biomed Eng Comput Biol. 2016;7(Suppl 1):1-15. (2) Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301- 310. (3) Harkema H, Dowling JN, Thornblade T. Con-Text: An algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42(5):839-851.
The Clinical Lab Information Retrieval (CLIR) Framework—An R Framework for CDW Clinical Lab Data Extraction and Retrieval
Purpose: Extract, retrieve, and validate clinical lab information from the VA Corporate Data Warehouse (CDW).
Background: CDW clinical lab information provide a unique opportunity to assess real world cancer treatment effectiveness and safety with higher granularity and validity compared to administrative data. Unfortunately, there is significant heterogeneity in how this information is encoded across time and geography. Various efforts have been made to clean these data and provide a consistent and reliable mapping; however, the availability and validity of these efforts also vary across lab concepts. This presents a significant barrier to utilization of CDW clinical lab information in comparative effectiveness research.
Methods: We defined a conceptual framework for retrieval of lab information 5 features: Logical Observation Identifiers Names and Codes (LOINC) codes, test names, topography, unit, and unit reference ranges. This was then implemented as a framework in R comprised of 7 discrete modules. Each module corresponds to a defined task in the conceptual framework: Concept -> LOINC/test name -> cleaned LOINC/test name -> LOINC/test name internal identifier -> fact information retrieval -> topography selection -> unit and reference range cleaning and harmonization. Each module has a defined input and output allowing implementation transparency, reproducibility, and flexibility.
Results: Using the CLIR framework, we retrieved peripheral blood total white count of patients with hematologic malignancies. In a cohort of about 300,000 patients diagnosed and or treated for a hematologic malignancy in the VHA between 2001-2016, we identified ~ 11x10^6 potential total WBC count based on LOINC codes and lab test name. Of those, ~ 9x106 were mappable to the correct topography, and the overwhelming majority of which (99%) were mappable to a harmonized unit and reference range.
Conclusion: The CLIR framework provides a conceptual framework and an implementation in R for clinical lab information retrieval from the VA CDW. Future efforts will entail refining the methodology across multiple data domains and comparing CLIR output with other ongoing efforts aimed at cleaning and harmonization of clinical lab data in the CDW.
Purpose: Extract, retrieve, and validate clinical lab information from the VA Corporate Data Warehouse (CDW).
Background: CDW clinical lab information provide a unique opportunity to assess real world cancer treatment effectiveness and safety with higher granularity and validity compared to administrative data. Unfortunately, there is significant heterogeneity in how this information is encoded across time and geography. Various efforts have been made to clean these data and provide a consistent and reliable mapping; however, the availability and validity of these efforts also vary across lab concepts. This presents a significant barrier to utilization of CDW clinical lab information in comparative effectiveness research.
Methods: We defined a conceptual framework for retrieval of lab information 5 features: Logical Observation Identifiers Names and Codes (LOINC) codes, test names, topography, unit, and unit reference ranges. This was then implemented as a framework in R comprised of 7 discrete modules. Each module corresponds to a defined task in the conceptual framework: Concept -> LOINC/test name -> cleaned LOINC/test name -> LOINC/test name internal identifier -> fact information retrieval -> topography selection -> unit and reference range cleaning and harmonization. Each module has a defined input and output allowing implementation transparency, reproducibility, and flexibility.
Results: Using the CLIR framework, we retrieved peripheral blood total white count of patients with hematologic malignancies. In a cohort of about 300,000 patients diagnosed and or treated for a hematologic malignancy in the VHA between 2001-2016, we identified ~ 11x10^6 potential total WBC count based on LOINC codes and lab test name. Of those, ~ 9x106 were mappable to the correct topography, and the overwhelming majority of which (99%) were mappable to a harmonized unit and reference range.
Conclusion: The CLIR framework provides a conceptual framework and an implementation in R for clinical lab information retrieval from the VA CDW. Future efforts will entail refining the methodology across multiple data domains and comparing CLIR output with other ongoing efforts aimed at cleaning and harmonization of clinical lab data in the CDW.
Purpose: Extract, retrieve, and validate clinical lab information from the VA Corporate Data Warehouse (CDW).
Background: CDW clinical lab information provide a unique opportunity to assess real world cancer treatment effectiveness and safety with higher granularity and validity compared to administrative data. Unfortunately, there is significant heterogeneity in how this information is encoded across time and geography. Various efforts have been made to clean these data and provide a consistent and reliable mapping; however, the availability and validity of these efforts also vary across lab concepts. This presents a significant barrier to utilization of CDW clinical lab information in comparative effectiveness research.
Methods: We defined a conceptual framework for retrieval of lab information 5 features: Logical Observation Identifiers Names and Codes (LOINC) codes, test names, topography, unit, and unit reference ranges. This was then implemented as a framework in R comprised of 7 discrete modules. Each module corresponds to a defined task in the conceptual framework: Concept -> LOINC/test name -> cleaned LOINC/test name -> LOINC/test name internal identifier -> fact information retrieval -> topography selection -> unit and reference range cleaning and harmonization. Each module has a defined input and output allowing implementation transparency, reproducibility, and flexibility.
Results: Using the CLIR framework, we retrieved peripheral blood total white count of patients with hematologic malignancies. In a cohort of about 300,000 patients diagnosed and or treated for a hematologic malignancy in the VHA between 2001-2016, we identified ~ 11x10^6 potential total WBC count based on LOINC codes and lab test name. Of those, ~ 9x106 were mappable to the correct topography, and the overwhelming majority of which (99%) were mappable to a harmonized unit and reference range.
Conclusion: The CLIR framework provides a conceptual framework and an implementation in R for clinical lab information retrieval from the VA CDW. Future efforts will entail refining the methodology across multiple data domains and comparing CLIR output with other ongoing efforts aimed at cleaning and harmonization of clinical lab data in the CDW.
Breast Cancer Treatment Among Rural and Urban Women at the Veterans Health Administration
Purpose: Women with breast cancer are increasingly being diagnosed and cared for within the VA. Breast cancer specialists are available only at large VA hospitals in urban regions, possibly impacting the outcomes of rural women. The health outcomes of rural women at the VA have not been well described and are currently a research priority. We described the differences between urban and rural women’s demographics and breast cancer characteristics. We then compared urban and rural women with nonmetastatic breast cancer on type of lymph node biopsy, type of breast surgery, adjuvant radiation, adjuvant chemotherapy, and hormone therapy.
Methods: Following IRB approval, 4,025 women with nonmetastatic breast cancer from 1995 to 2012 were identified from the Veterans Affairs Central Cancer Registry (VACCR). This dataset contained diagnosis date, histology, tumor size, tumor grade, lymph node status, and estrogen receptor status. The VACCR also gathered type of lymph node surgery, type of breast surgery, adjuvant radiation, adjuvant chemotherapy, and adjuvant hormone therapy. Patient-specific data included date of birth, ethnicity, and zip code of residence at the time of diagnosis. The Rural Urban Commuting Areas 2.0 (RUCA) was used to define rural status and collated further into 3 categories of urban, large rural, and small rural. STATA data analysis and statistical software was used to organize and analyze data. The associations between the 3 rural/urban categories and diagnosis year, age, ethnicity, histology and tumor grade were assessed by ordinal logistic regression. Tumor size was compared using rank sum test. Lymph node and estrogen receptor status were compared with logistic regression, and lymph node sampling methods with multinomial regression. All other treatments were compared between small rural and urban women using logistic regression, and further analyzed with adjustments for factors that could influence treatment choices, including diagnosis year, age, ethnicity, tumor size and grade, lymph node status, and estrogen receptor status.
Results: Most women (n = 3,192) with nonmetastatic breast cancer resided in urban regions, 423 women in large rural regions, and 410 in small rural regions. The number of women living in urban and rural regions did not shift significantly over time (P = .48). The age distributions of rural and urban women did not differ. Women with breast cancer in rural regions were more likely to be white (P ≤ .001, 69% white urban; 90% white small rural; 24% black urban, and 6% black small rural). Tumor histology, size, grade, and lymph node and estrogen receptor status did not differ significantly between rural and urban. Mastectomy was more common among rural women initially, but after adjustments for patient demographics and breast cancer characteristics, urban and rural women received similar proportions of mastectomies. After adjustments, urban and rural women received equivalent breast cancer surgery, adjuvant radiation and adjuvant hormone therapy. However, after controlling for confounding factors, a disproportionate number of urban women receive no lymph node biopsy (P = .05). Additionally, women from large rural regions were statistically more likely to receive adjuvant chemotherapy (P = .04), although the chemotherapy administration did not differ statistically between women from urban and small rural regions (P = .7).
Conclusions: Most women diagnosed with breast cancer at the VA from 1995 to 2012 resided in urban areas. Rural women were much more likely to be white, but the age at diagnosis did not differ. Breast cancer characteristics were similar between rural and urban women. Women living in large rural regions were more likely to receive adjuvant chemotherapy than were women from urban or small rural regions; however reporting differences should be considered as an explanation. A higher proportion of urban women received no lymph node biopsy, which merits further investigation. Breast conservation therapy was administered consistently among rural and urban women veterans.
Purpose: Women with breast cancer are increasingly being diagnosed and cared for within the VA. Breast cancer specialists are available only at large VA hospitals in urban regions, possibly impacting the outcomes of rural women. The health outcomes of rural women at the VA have not been well described and are currently a research priority. We described the differences between urban and rural women’s demographics and breast cancer characteristics. We then compared urban and rural women with nonmetastatic breast cancer on type of lymph node biopsy, type of breast surgery, adjuvant radiation, adjuvant chemotherapy, and hormone therapy.
Methods: Following IRB approval, 4,025 women with nonmetastatic breast cancer from 1995 to 2012 were identified from the Veterans Affairs Central Cancer Registry (VACCR). This dataset contained diagnosis date, histology, tumor size, tumor grade, lymph node status, and estrogen receptor status. The VACCR also gathered type of lymph node surgery, type of breast surgery, adjuvant radiation, adjuvant chemotherapy, and adjuvant hormone therapy. Patient-specific data included date of birth, ethnicity, and zip code of residence at the time of diagnosis. The Rural Urban Commuting Areas 2.0 (RUCA) was used to define rural status and collated further into 3 categories of urban, large rural, and small rural. STATA data analysis and statistical software was used to organize and analyze data. The associations between the 3 rural/urban categories and diagnosis year, age, ethnicity, histology and tumor grade were assessed by ordinal logistic regression. Tumor size was compared using rank sum test. Lymph node and estrogen receptor status were compared with logistic regression, and lymph node sampling methods with multinomial regression. All other treatments were compared between small rural and urban women using logistic regression, and further analyzed with adjustments for factors that could influence treatment choices, including diagnosis year, age, ethnicity, tumor size and grade, lymph node status, and estrogen receptor status.
Results: Most women (n = 3,192) with nonmetastatic breast cancer resided in urban regions, 423 women in large rural regions, and 410 in small rural regions. The number of women living in urban and rural regions did not shift significantly over time (P = .48). The age distributions of rural and urban women did not differ. Women with breast cancer in rural regions were more likely to be white (P ≤ .001, 69% white urban; 90% white small rural; 24% black urban, and 6% black small rural). Tumor histology, size, grade, and lymph node and estrogen receptor status did not differ significantly between rural and urban. Mastectomy was more common among rural women initially, but after adjustments for patient demographics and breast cancer characteristics, urban and rural women received similar proportions of mastectomies. After adjustments, urban and rural women received equivalent breast cancer surgery, adjuvant radiation and adjuvant hormone therapy. However, after controlling for confounding factors, a disproportionate number of urban women receive no lymph node biopsy (P = .05). Additionally, women from large rural regions were statistically more likely to receive adjuvant chemotherapy (P = .04), although the chemotherapy administration did not differ statistically between women from urban and small rural regions (P = .7).
Conclusions: Most women diagnosed with breast cancer at the VA from 1995 to 2012 resided in urban areas. Rural women were much more likely to be white, but the age at diagnosis did not differ. Breast cancer characteristics were similar between rural and urban women. Women living in large rural regions were more likely to receive adjuvant chemotherapy than were women from urban or small rural regions; however reporting differences should be considered as an explanation. A higher proportion of urban women received no lymph node biopsy, which merits further investigation. Breast conservation therapy was administered consistently among rural and urban women veterans.
Purpose: Women with breast cancer are increasingly being diagnosed and cared for within the VA. Breast cancer specialists are available only at large VA hospitals in urban regions, possibly impacting the outcomes of rural women. The health outcomes of rural women at the VA have not been well described and are currently a research priority. We described the differences between urban and rural women’s demographics and breast cancer characteristics. We then compared urban and rural women with nonmetastatic breast cancer on type of lymph node biopsy, type of breast surgery, adjuvant radiation, adjuvant chemotherapy, and hormone therapy.
Methods: Following IRB approval, 4,025 women with nonmetastatic breast cancer from 1995 to 2012 were identified from the Veterans Affairs Central Cancer Registry (VACCR). This dataset contained diagnosis date, histology, tumor size, tumor grade, lymph node status, and estrogen receptor status. The VACCR also gathered type of lymph node surgery, type of breast surgery, adjuvant radiation, adjuvant chemotherapy, and adjuvant hormone therapy. Patient-specific data included date of birth, ethnicity, and zip code of residence at the time of diagnosis. The Rural Urban Commuting Areas 2.0 (RUCA) was used to define rural status and collated further into 3 categories of urban, large rural, and small rural. STATA data analysis and statistical software was used to organize and analyze data. The associations between the 3 rural/urban categories and diagnosis year, age, ethnicity, histology and tumor grade were assessed by ordinal logistic regression. Tumor size was compared using rank sum test. Lymph node and estrogen receptor status were compared with logistic regression, and lymph node sampling methods with multinomial regression. All other treatments were compared between small rural and urban women using logistic regression, and further analyzed with adjustments for factors that could influence treatment choices, including diagnosis year, age, ethnicity, tumor size and grade, lymph node status, and estrogen receptor status.
Results: Most women (n = 3,192) with nonmetastatic breast cancer resided in urban regions, 423 women in large rural regions, and 410 in small rural regions. The number of women living in urban and rural regions did not shift significantly over time (P = .48). The age distributions of rural and urban women did not differ. Women with breast cancer in rural regions were more likely to be white (P ≤ .001, 69% white urban; 90% white small rural; 24% black urban, and 6% black small rural). Tumor histology, size, grade, and lymph node and estrogen receptor status did not differ significantly between rural and urban. Mastectomy was more common among rural women initially, but after adjustments for patient demographics and breast cancer characteristics, urban and rural women received similar proportions of mastectomies. After adjustments, urban and rural women received equivalent breast cancer surgery, adjuvant radiation and adjuvant hormone therapy. However, after controlling for confounding factors, a disproportionate number of urban women receive no lymph node biopsy (P = .05). Additionally, women from large rural regions were statistically more likely to receive adjuvant chemotherapy (P = .04), although the chemotherapy administration did not differ statistically between women from urban and small rural regions (P = .7).
Conclusions: Most women diagnosed with breast cancer at the VA from 1995 to 2012 resided in urban areas. Rural women were much more likely to be white, but the age at diagnosis did not differ. Breast cancer characteristics were similar between rural and urban women. Women living in large rural regions were more likely to receive adjuvant chemotherapy than were women from urban or small rural regions; however reporting differences should be considered as an explanation. A higher proportion of urban women received no lymph node biopsy, which merits further investigation. Breast conservation therapy was administered consistently among rural and urban women veterans.