Harnessing Big Data to Support Precision Medicine

Posted On // Leave a Comment

During his 2015 State of the Union address, President Obama announced his Precision Medicine Initiative (PMI) - a research effort designed to develop tailor-made treatments based on an individual's genetics and strain of disease. This campaign coincides with the government’s Cancer Moonshot plan, which will work to “accelerate research efforts and break down barriers” as the industry collaborates to develop innovative, personalized cancer treatments using data insights.
One year after Obama’s PMI announcement, one crucial concept was touched on at the White House Precision Medicine Initiative (PMI) SummitBig Data.Specifically, how the collecting and deciphering of Big Data is the skeleton key that that will unlock the development of personalized medicine.

However, this idea goes hand in hand with some important questions: How can you comb through vast amounts of data to reach relevant insights that can be applied to the creation of revolutionary and effective treatments? Where do we even start looking? To address these issues, let’s take a step back and examine how Big Data is currently being used to draw conclusions in the oncology clinical research stage, the processes used to scale down and interpret massive amounts of information, and how to translate statistical findings into real-world settings.

Current state of Big Data in oncology clinical research

The process of genomics, the study of DNA sequencing, is not comprehensive enough on its own to thoroughly understand cancer behaviors and identify vital biomarkers that help determine what types of treatments patients will respond to. Predictive diagnostics is where Big Data comes into play – coupling it with other forms of diagnostic testing (i.e., MRIs, blood tests, tissue imaging) to obtain a holistic picture of patient outcomes. If you look at large populations within clinical trial studies, you can make these assumptive correlations about what treatments will be effective based on certain genetic factors. Big Data analysis in precision medicine requires a sizeable study of patients so that researchers can understand what effects are occurring across a controlled population.
It is difficult to accurately draw these same conclusions from smaller sample sizes. Looking at smaller populations leaves more room for error and deviation. A popular misconception about Big Data is that scientists just like a lot of information. That is not the case – it is about statistical correlation and the more data you have, the stronger the parallel is.
On a business level, oncology field stakeholders are starting to look at hospitals and research facilities across the country that are administering innovative, specialized treatments that were developed from Big Data analysis. They are then rewarding these facilities that are producing effective patient outcomes.  
Ralf Huss, M.D.
Chief Medical Officer, Definiens

Big Data scale-down processes

When researchers first begin to look at correlations discovered within clinical trial populations, it is difficult to determine what the most important pieces are, and without a condensing process, they end up pulling and analyzing an abundance of information. From a storage and data manipulation standpoint alone, it is ineffective and daunting to deal with millions of repeating cells in Excel. Data reduction is a key process that allows researchers to scale down information (i.e., genome sequences, tissue DNA, etc.) into comprehendible patterns and formats in order to diagnose and develop personalized treatments for patients. It helps narrow down and identify what factors are most vital for achieving desired outcomes.

For instance, in a clinical trial of 1,000 people, you could easily receive 40 terabytes of data. For context on how massive that is, a terabyte is two to the 40th power, or approximately a trillion bytes. From there, researchers might create a heat map to identify trends and commonalities. They may also implement the Random Forest technique, which looks at correlations between predictive factors, like biomarkers, and pinpoints what pieces of the data are most relevant and forecasting. Random Forest is another method that reduces the data size and makes the information easier to store and manage.

Once the most valuable factors are classified, clinicians can then begin a statistical analysis to determine overall population results and trends, such as prognosis and survival rate. Clinicians may then be able to say, “If a patient has biomarker A and are given therapy B, they are likely to survive longer than if given therapy C.” This is the dream goal of harnessing Big Data to support personalized medicine. It all starts with identifying predictive information and then plotting those points towards the status of the patient’s disease.

Translating findings into the real world

Oncologists are in the field of crunching through findings and making predictions, but at the end of the day, they are not statisticians. When oncologists need to make decisions, they do not just want to look at a data table. They need to be able to see how predictive markers play out in a clinical setting. The real question for them is around deciding how to physically use the data that pertains to their patients. In the clinical trial phase, this is not as much of an issue because there are many trusted parties involved in the decision-making. However, when transported into a real-world hospital setting, doctors need to get their patients on the right drug, and fast. They need to be trained to make these data-driven decisions on their feet.

Unfortunately, in medical school, physicians-in-training do not receive a lot of education on bioinformatics and data analysis. Therefore, if researchers demonstrate a statistical correlation to a doctor, they may not have a strong basis for understanding it, nor do they want to spend time examining it. In order for lab-generated successes to translate to the hospital, the industry needs to reevaluate the required education for oncologists in this era of Big Data-based innovation.


Big Data gets its name for a reason; information sets are so vast and complex that even with reduction techniques, it can be overwhelming for practitioners and researchers to feel they have a handle on the figures at their fingertips. While it is possible to interpret Big Data without scaling it down, it is highly complex and significantly increases the odds of inaccuracies. At the end of the day, patterns and trends need to be identified, which can only be effectively accomplished via data reduction techniques.

We are living in the information age so it is imperative that we embrace the technologies and insights that can help us decode the intricacies of cancer. Federal plans like the Precision Medicine Initiative and Cancer Moonshot are huge strides in the right direction for reaping the greatest benefits of Big Data – strapping it down to develop individualized treatments as we continue to see the “one size fits all” method be ineffective.

Courtesy of Ralf Huss, M.D