From the Editor-in-Chief
Can Big Data Launch Cancer Research?
By William G. Nelson, MD, PhD
In his 2016 State of the Union address, President Barack Obama announced he would put Vice President Joe Biden in charge of “mission control” for a new “moonshot” initiative that will “make America the country that cures cancer once and for all.”
This bold challenge was met with excitement and skepticism. There was great enthusiasm that the vice president, who lost his son to cancer, would put his heart and soul into beefing up support for cancer research. By contrast, cynics warned that cancer is a complex collection of disorders resistant to cures, that similar initiatives had failed in the past and that Congress might not allocate sufficient funds.
My own bias for the moonshot metaphor is that cancer research, like space exploration, has been to the moon and has the “rocks” to show for it: We can cure several cancer types, even at advanced stages, for many people. Nonetheless, we have yet to colonize the moon—more than half a million Americans die of cancer each year.
One key priority in the vice president’s cancer plan is the collection, sharing and analysis of big data. Advances in information science and technology have created an astonishing capacity to gather, transfer, store and analyze large and diverse collections of data—measured in petabytes (1015 bytes). Is cancer research ready for big data?
Information technology analyst Doug Laney describes big data in three dimensions: volume, where new data storage technologies have pushed capacity barriers; velocity, where torrents of data arrive increasingly in real time; and variety, where many different types of data are created and collected. Big data was envisioned initially for e-commerce to explain trends and inform management decisions, but now it influences thinking in fields as diverse as astronomy, geology and biomedicine.
Large-scale analyses of big data in cancer medicine promise to yield insights into disease and treatment that could transform care. For this promise to be realized, however, the electronic medical record (EMR) needs to be reimagined. Traditional EMR software was created for health care billing. Newer EMR tools provide medical decision support and offer prompts for laboratory and radiologic testing to improve outcomes while controlling costs. To exploit big-data analytics, the EMR must evolve to provide a quantitative medical portrait of an individual. For cancer, this would include genomic data, imaging data and patient-reported symptom data, in addition to the conventional narrative history and physical examination results.
Moving forward, all of the information deposited in the EMR should reflect the best measurement science principles, with special attention to the accuracy, reproducibility and consistency of data. Verbal descriptions of skin lesions might be replaced by photography. Descriptions of heart murmurs heard through a stethoscope might be replaced by recordings of the actual sounds. A test for a cancer biomarker in a biopsy, scored on a numbered scale, might be replaced by a scanned image of the stained slide.
Once data collection and storage have been optimized for patient care, new approaches to analyzing big data can be unleashed, creating a “learning health care system” that incorporates innovation in real time and assesses its impact seamlessly. However, safeguards introduced to protect patients participating in biomedical research and to secure the privacy of health information may be significant barriers to accessing big data. In a 2014 piece in the New England Journal of Medicine, Ruth Faden, Tom Beauchamp and Nancy Kass argue that biomedical research ethics can be modernized to address these concerns. Their Common Purpose Framework for ethics aims to respect the rights and dignity of patients, incorporate physician judgment, deliver optimal care, avoid undue risks and burdens, reduce inequities and foster outcome improvement.
The guidance computer used on the Apollo spacecraft sent to the moon had 4,096 bytes of random-access memory and 73,728 bytes of core memory. If EMRs can be upgraded and a new ethical framework governing access to health care data can be created, we will see if computers able to manage petabytes of information can deliver on the promise of big data for a cancer moonshot.
WILLIAM G. NELSON, MD, PhD, is the director of the Johns Hopkins Kimmel Cancer Center in Baltimore.