Feature

Massive databases unleash discovery, but not so much in the U.S.

Publish date: October 26, 2023

Which conditions are caused by infection? Though it may seem like an amateur concern in the era of advanced microscopy, some culprits evade conventional methods of detection. Large medical databases hold the power to unlock answers.

A recent study from Sweden and Denmark meticulously traced the lives and medical histories of nearly one million men and women in those countries who had received blood transfusions over nearly five decades. Some of these patients later experienced brain bleeds. The inescapable question: Could a virus found in some donor blood have caused the hemorrhages?

Traditionally, brain bleeds have been thought to strike at random. But the new study, published in JAMA, points toward an infection that causes or, at the very least, is linked to the condition. The researchers used a large databank to make the discovery.

“As health data becomes more available and easier to analyze, we’ll see all kinds of cases like this,” said Jingcheng Zhao, MD, of the clinical epidemiology division of Sweden’s Karolinska Institutet in Solna and lead author of the study.

Scientists say the field of medical research is on the cusp of a revolution as immense health databases guide discovery and improve clinical care.

“If you can aggregate data, you have the statistical power to identify associations,” said David R. Crosslin, PhD, professor in the division of biomedical informatics and genomics at Tulane University in New Orleans. “It opens up the world for understanding diseases.”

With access to the large database, Dr. Zhao and his team found that some blood donors later experienced brain bleeds. And it turned out that the recipients of blood from those same donors carried the highest risk of experiencing a brain bleed later in life. Meanwhile, patients whose donors remained bleed-free had the lowest risk.

Not so fast in the United States

In Nordic countries, all hospitals, clinics, and pharmacies report data on diagnoses and health care visits to the government, tracking that began with paper and pen in the 1960s. But the United States health care system is too fragmented to replicate such efforts, with several brands of electronic medical records operating across different systems. Data sharing across institutions is minimal.

Most comparable health data in the United States comes from reimbursement information collected by the Centers for Medicare & Medicaid Services on government-sponsored insurance programs.

“We would need all the health care systems in the country to operate within the same IT system or use the same data model,” said Euan Ashley, MD, PhD, professor of genomics at Stanford (Calif.) University. “It’s an exciting prospect. But I think [the United States] is one of the last countries where it’ll happen.”

States, meanwhile, collect health data on specific areas like sexually transmitted infection cases and rates. Other states have registries, like the Connecticut Tumor Registry, which was established in 1941 and is the oldest population-based cancer registry in the world.

But all of these efforts are ad hoc, and no equivalent exists for heart disease and other conditions.

Health data companies have recently entered the U.S. data industry mainly through partnerships with health systems and insurance companies, using deidentified information from patient charts.

The large databases have yielded important findings that randomized clinical trials simply cannot, according to Dr. Ashley.

For instance, a study found that a heavily-lauded immunotherapy treatment did not provide meaningful outcomes for patients aged 75 years or older, but it did for younger patients.

This sort of analysis might enable clinicians to administer treatments based on how effective they are for patients with particular demographics, according to Cary Gross, MD, professor at Yale University in New Haven, Conn.

“From a bedside standpoint, these large databases can identify who benefits from what,” Dr. Gross said. “Precision medicine is not just about genetic tailoring.” These large datasets also provide insight into genetic and environmental variables that contribute to disease.

For instance, the UK Biobank has more than 500,000 participants paired with their medical records and scans of their body and brain. Researchers perform cognitive tests on participants and extract DNA from blood samples over their lifetime, allowing examination of interactions between risk factors.

A similar but much smaller-scale effort underway in the United States, called the All of Us Research Program, has enrolled more than 650,000 people, less than one-third the size of the UK Biobank by relative populations. The goal of the program is to provide insights into prevention and treatment of chronic disease among a diverse set of at least one million participants. The database includes information on sexual orientation, which is a fairly new datapoint collected by researchers in an effort to study health outcomes and inequities among the LGBTQ+ community.

Dr. Crosslin and his colleagues are writing a grant proposal to use the All of Us database to identify genetic risks for preeclampsia. People with certain genetic profiles may be predisposed to the life-threatening condition, and researchers may discover that lifestyle changes could decrease risk, Dr. Crosslin said.

Massive databases unleash discovery, but not so much in the U.S.

Not so fast in the United States

Pages

Recommended Reading