Credit: Louis Reed


Artificial Intelligence

Using data science to forecast clinical trial outcomes may help biomedical stakeholders de-risk their portfolios


MIT Sloan and CSAIL researchers apply artificial intelligence techniques to one of the largest datasets of clinical trial outcomes to handicap the drug and device approval process

Cambridge, MA, July 8, 2019 A new study published today in the inaugural issue of the Harvard Data Science Review, by researchers from the Massachusetts Institute of Technology, applies machine-learning and statistical techniques to predict the outcomes of randomized clinical trials for new drug and device candidates. In addition to the publication, the software used in the study will be made publicly available with an open-source license here.

The research is part of an ongoing collaboration between the MIT Laboratory for Financial Engineering (LFE) and Informa Pharma Intelligence, named Project ALPHA (Analytics for Life-sciences Professionals and Healthcare Advocates). Project ALPHA leverages Informa datasets from Citeline and machine learning to train and validate its predictive models, with the goal of providing timely and more accurate estimates of the risks and rewards of clinical trials to the entire biopharma ecosystem. The ultimate goal of the project is to help patients and their families by developing analytics that allow investors, biopharma professionals, regulators, and patient advocates to better manage the tremendous risks of drug and device development.

“Everyone is affected by the risk of a drug failing in its clinical trial process,” says the study’s senior author and director of MIT's LFE as well as a Principal Investigator at the Computer Science and Artificial Intelligence Laboratory (CSAIL). Kien Wei Siah and Chi Heem Wong, two LFE/CSAIL Ph.D. students who co-authored the publication, observed that “You can’t manage what you don’t measure, so this is a new tool for measuring the risk of clinical trials more accurately, allowing all stakeholders to plan more effectively for these risks.”

This study uses the largest set of data to date for analyzing the success or failure of clinical trial outcomes, and combines machine-learning techniques with statistical methods to account for missing data. In many machine-learning applications, missing data is a problem that is often addressed by deleting large sections of the data. Such deletions lead to loss and, more critically, the distortion of information due to the selective natures of the data reporting and recording. A carefully designed statistical imputation technique can mitigate these problems by estimating the missing values along with other model parameters such as the probability of success, and hence achieve more accurate forecasting than is possible using the common deletion method.

A prior study published by the authors reported historical probabilities of success for clinical trials without using any additional information; these “unconditional” estimates are now updated quarterly and available at the Project ALPHA website. Building on that foundation, the new publication uses over 140 features — including trial outcome, trial status, trial accrual rates, duration, prior approval for another indication, and sponsor track record — to forecast clinical-trial outcomes. The more information being used appropriately in making forecasts, the more accurate the estimates are likely to be.

“Anyone involved in the clinical trials process –– from researchers all the way down to the patient –– can benefit from greater understanding of the landscape and use of new technologies evaluating what’s working and what’s not,” said Mark Gordon, EVP Corporate Development and Innovation in Informa’s business intelligence division.

“It’s the difference between looking back at historical wins and losses to predict the outcome of a horse race versus handicapping the likely winner based on multiple factors like the horse’s pedigree, track record, temperament, the training regimen, the condition of the track, the jockey’s skill, and so on,” Lo adds. “With more accurate measures of the risk of drug and device development, we hope to encourage greater investment at this unique inflection point in biomedicine.”

About the MIT Laboratory for Financial Engineering

The MIT Laboratory for Financial Engineering (LFE) is a research center focused on the quantitative analysis of financial markets and institutions using mathematical, statistical, and computational models and methods. The goal of the LFE is to support and promote academic advances in financial engineering and computational finance that can be directly applied for the betterment of the world. To do that, LFE faculty, students, and staff engage with industry professionals, regulators, policymakers, and other stakeholders to develop and apply new financial technologies to practical and socially important settings.

About Informa Pharma Intelligence

Informa Pharma Intelligence powers a full suite of analysis products — Datamonitor Healthcare, Sitetrove, Trialtrove, Pharmaprojects, Medtrack, Biomedtracker, Scrip, Pink Sheet, and In Vivo –– to deliver the data needed by the pharmaceutical and biomedical industry to make decisions and create real-world opportunities for growth. With more than 500 analysts keeping their fingers on the pulse of the industry, no key disease, clinical trial, drug approval or R&D project isn’t covered through the breadth and depth of data available to customers. For more information visit

For more info Patricia Favreau Associate Director (617) 895-6025