According to IBM, over 2 quintillion bytes of data are generated every day (that’s a 2 with 18 zeros!), with over 90% of the data in the world today generated in the past 2 years alone.
In our private lives, much of this information is generated through online shopping, web surfing, and popular websites such as Facebook and Twitter. Companies are making incredible efforts to collect these data and to use it to improve how they relate to customers and, ultimately, to make more money. For example, companies like Google, Amazon, Facebook, and Netflix collect enormous amounts of data and then use algorithms to provide real-time suggestions for what their customers might want to rent, buy, or click on. These algorithms, which companies use for anything from predicting customer behavior to facial recognition, were developed in the field of machine learning, a branch of computer science that focuses on how to learn from data.
Big data and critical care
Although the “big data” revolution has proliferated across the private sector, medicine has been slow to utilize the data we painstakingly collect in hospitals every day in order to improve patient care.
Clinicians typically rely on their intuition and the few clinical trials that their patients would have been included in to make decisions, and evidence-based clinical decision support tools are often not available or not used. The tools and scores we have at our disposal are often oversimplified so that they can be calculated by hand and usually rely on the clinician to manually gather information from the electronic health record (EHR) to calculate the score. However, this is starting to change. From partnerships between IBM Watson and hospitals, to groups developing and implementing clinical decision support tools in the EHR, it is clear that hospitals are becoming increasingly interested in learning from and using the enormous amount of data that are just sitting in the hospital records.
Although there are many areas in medicine that stand to benefit from harnessing the data available in the EHR to improve patient care, critical care should be one of the specialties that benefits the most. With the variety and frequency of monitoring that critically ill patients receive, there are large swaths of data available to collect, analyze, and harness to improve patient care. The current glut of information results in data overload and alarm fatigue for today’s clinicians, but intelligent use of these data holds promise for making care safer and more efficient and effective.
Groups have already begun using these data to develop tools to identify patients with ARDS (Herasevich V, et al. Intensive Care Med. 2009;35[6]:1018-23), patients at risk of adverse drug reactions (Harinstein LM, et al. J Crit Care. 2012;27[3]:242-9), and those with sepsis (Tafelski S, et al. J Int Med Res. 2010;38:1605-16).
Furthermore, groups have begun “crowdsourcing” critical care problems by making large datasets publicly available, such as the Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) database, which now holds clinical data from over 40,000 ICU stays from Beth Israel Deaconess Medical Center. Continued efforts to utilize data from patients in the ICU have the potential to revolutionize the care in hospitals today.
An important area of critical care that has seen a rapid rise in the use of EHR data to create decision support tools is in the early detection of critical illness. Given that many in-hospital cardiac arrests occur outside the ICU and delays in transferring critically ill patients to the ICU increase morbidity and mortality (Churpek MM, et al. J Hosp Med. 2016;11[11]:757-62), detecting critical illness early is incredibly important.
For millennia, clinicians have relied on their intuition and experience to determine which patients have a poor prognosis or need increased levels of care. In the 1990s, rapid response teams (RRTs) were developed, with the goal of identifying and treating critical illness earlier. Along with them came early warning scores, which are objective tools that typically use vital sign abnormalities to detect patients at high risk of clinical deterioration. RRTs and the early warning scores used to activate them have proliferated around the world, including in the United States, and scores like the Modified Early Warning Score (MEWS) are available for automatic calculation in the EHR.
However, taking a tool such as the MEWS that can easily be calculated by hand and making our expensive EHRs calculate it is a lot like buying a Ferrari just to drive it around the parking lot. There is no reason to limit our decision support tools to simple algorithms with only a few variables, especially when patients’ lives are at stake.
Several groups around the country have, therefore, begun to utilize other variables in the EHR, such as laboratory values, to create integrated decision support tools for the early identification of critical illness. For example, Kollef and colleagues developed a statistical model to identify critical illness and implemented it on the wards to activate their RRT, which resulted in decreased lengths of stay in the intervention group (Kollef MH, et al. J Hosp Med. 2014;9[7]:424-9).
Escobar et al. developed a model to predict ICU transfer or non-DNR deaths in the Kaiser system and found it to be more accurate than the MEWS in a validation cohort (Escobar GJ, et al. J Hosp Med. 2012;7[5]:388-95). A clinical trial of their system is ongoing.
Finally, our group developed a model called eCART in a multicenter study of over 250,000 patients and has since implemented it in our hospital. An early “black-box” study found that eCART detected more patients who went on to experience a cardiac arrest or ICU transfer than our usual care RRT and it did so 24 hours earlier (Kang MA, et al. Crit Care Med. 2016;44[8]:1468-73). These scores and many more will likely become commonplace in hospitals to provide an objective and accurate way to identify critically ill patients earlier, which may result in decreased preventable morbidity and mortality.
Future directions
There are several important future directions at the intersection of big data and critical care.
First, efforts to collect, store, and share the highly granular data in the ICU are paramount for successful and generalizable research collaborations. Although there are often institutional barriers to data sharing to surmount, efforts such as the MIMIC database provide a roadmap for how ICU data can be shared and problems “crowdsourced” in order to allow researchers access to these data for high quality research.
Second, efforts to fuse randomized controlled trials with big data, such as randomized, embedded, multifactorial, adaptive platform (REMAP) trials, have the potential to greatly enhance the way trials are done in the future. REMAP trials would be embedded in the EHR, provide the ability to study multiple therapies at once, and adapt the randomization scheme to ensure that patients are not harmed by interventions that are clearly detrimental while the study is ongoing (Angus DC. JAMA. 2015;314[8]:767-8).
Finally, it is important that we move beyond the classic statistical methods that are commonly used to develop decision support tools and increase our use of more modern machine learning techniques that companies in the private sector use every day. For example, our group found that classic regression methods were the least accurate of all the methods we studied for detecting clinical deterioration on the wards (Churpek MM, et al. Crit Care Med. 2016;44[2]:368-74). In the future, methods such as the random forest and neural network should become commonplace in the critical care literature.
The big data revolution is here, both in our private lives and in the hospital. The future will bring continued efforts to use data to identify critical illness earlier, improve the care of patients in the ICU, and implement smarter and more efficient clinical trials. This should rapidly increase the generation and utilization of new knowledge and will have a profound impact on the way we care for critically ill patients.
Dr. Churpek is assistant professor, section of pulmonary and critical care medicine, department of medicine at University of Chicago.
Editor’s comment
Why should busy ICU clinicians bother with big data? Isn’t this simply a “flash in the pan” phenomenon that has sprung up in the aftermath of the electronic medical records (EMRs) mandated by the Affordable Care Act? Are concerns valid that clinical data–based algorithms will lead to an endless stream of alerts akin to the ubiquitous pop-up ads for mortgage refinancing, herbal Viagra, and online gambling that has resulted from commercial data mining?
In this Critical Care Commentary, Dr. Matthew Churpek convincingly outlines the potential inherent in the big data generated by our collective ICUs. These benefits are manifesting themselves not just in the data populated within the EMR – but also in the novel ways we can now design and execute studies. And for those who aren’t yet convinced, recall that payers already use the treasure trove of information within our EMRs against us in the forms of self-serving quality metrics, punitive reimbursement, and unvalidated hospital comparison sites.
Lee E. Morrow, MD, FCCP, is the editor of the Critical Care Commentary section of CHEST Physician.