Rheumatologists and their staff have been dutifully recording disease activity and patient-reported outcomes for decades, and now, all that drudgery is beginning to pay off with the introduction of artificial intelligence (AI) and natural language processing systems that can mine electronic health records (EHRs) for nuggets of research gold and accurately predict short-term rheumatoid arthritis (RA) outcomes.
“I think we have learned from our very early experiments that longitudinal deep learning models can forecast rheumatoid arthritis [RA] outcomes with actually surprising efficiency, with fewer patients than we assumed would be needed,” said Jinoos Yazdany, MD, MPH, chief of rheumatology at Zuckerberg San Francisco General Hospital and Trauma Center, and codirector of the University of California San Francisco (UCSF) Quality and Informatics Lab.
At the 2024 Rheumatoid Arthritis Research Summit (RA Summit 2024), presented by the Arthritis Foundation and the Hospital for Special Surgery in New York City, Dr. Yazdany discussed why rheumatologists are well positioned to take advantage of predictive analytics and how natural language processing systems can be used to extract previously hard-to-find data from EHRs, which can then be applied to RA prognostics and research.
Data Galore
EHR data can be particularly useful for RA research because of the large volume of information, clinical data such as notes and imaging, less selection bias compared with other data sources such as cohorts or randomized controlled trials, real-time access, and the fact that many records contain longitudinal data (follow-ups, etc.).
However, EHR data may have gaps or inaccurate coding, and data such as text and images may require significant data processing and scrubbing before it can be used to advance research. In addition, EHR data are subject to patient privacy and security concerns, can be plagued by incompatibility across different systems, and may not represent patients who have less access to care, Dr. Yazdany said.
She noted that most rheumatologists record some measure of RA disease activity and patient physical function, and that patient-reported outcomes have been routinely incorporated into clinical records, especially since the 1980 introduction of the Health Assessment Questionnaire.
“In rheumatology, by achieving consensus and building a national quality measurement program, we have a cohesive national RA outcome measure selection strategy. RA outcomes are available for a majority of patients seen by rheumatologists, and that’s a critical strength of EHR data,” she said.
Spinning Text Into Analytics
The challenge for investigators who want to use this treasure trove of RA data is that more than 80% of the data are in the form of text, which raises questions about how to best extract outcomes data and drug dosing information from the written record.
As described in an article published online in Arthritis Care & Research February 14, 2023, Dr. Yazdany and colleagues at UCSF and Stanford University developed a natural language processing “pipeline” designed to extract RA outcomes from clinical notes on all patients included in the American College of Rheumatology’s Rheumatology Informatics System for Effectiveness (RISE) registry.
The model used expert-curated terms and a text processing tool to identify patterns and numerical scores linked to outcome measures in the records.
“This was an enormously difficult and ambitious project because we had many, many sites, the data was very messy, we had very complicated [independent review board] procedures, and we actually had to go through de-identification procedures because we were using this data for research, so we learned a lot,” Dr. Yazdany said.
The model processed 34 million notes on 854,628 patients across 158 practices and 24 different EHR systems.
In internal validation studies, the models had 95% sensitivity, 87% positive predictive value (PPV), and an F1 score (a measure of predictive performance) of 91%. Applying the model to an EHR from a large, non-RISE health system for external validation, the natural language processing pipeline had a 92% sensitivity, 69% PPV, and an F1 score of 79%.
The investigators also looked at the use of OpenAI large language models, including GPT 3.5 and 4 to interpret complex prescription orders and found that after training with 100 examples, GPT 4 was able to correctly interpret 95.6% of orders. But this experiment came at a high computational and financial cost, with one experiment running north of $3000, Dr. Yazdany cautioned.