Credit: Chokniti Khongchum from Pixabay



MIT’s Sloan School and Computer Science and Artificial Intelligence Lab researchers together with leading pharma data scientists use crowdsourcing to better forecast drug approvals

Results highlight the power of crowdsourcing in developing new models that leverage human and artificial intelligence to help biomedical stakeholders de-risk their portfolios.

Cambridge, MA – July 20, 2021 – In late 2019, researchers at the MIT Laboratory for Financial Engineering (LFE) and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) worked with Novartis—a leading global medicines company—to launch an in-house Data Science and Artificial Intelligence (DSAI) challenge to beat MIT’s machine-learning models for predicting clinical trial outcomes. The results of this ambitious challenge are now available in an article in Patterns, a new open-access data science journal published by Cell Press.

The DSAI challenge built on the work of the MIT research team led by Professor director of the LFE and principal investigator at CSAIL, and Kien Wei Siah and Chi Heem Wong, CSAIL students at the time, who, in 2019, published a paper on applications of machine-learning for predicting clinical trial outcomes. They used data provided by Informa Pharma Intelligence, which has one of the most comprehensive clinical trial intelligence solutions in the world. “Our goal in collaborating with Novartis was to validate key features previously found to be associated with regulatory approval, and to learn from industry experts about new features that can improve on our forecasts,” noted Prof. Lo.

Using the MIT model as a starting point, 50 teams composed of Novartis data scientists from around the world submitted their own models in a friendly competition. The winning team relied on handcrafted features that incorporated their own insights into drug development timelines and which data entries should be discarded. They found that one of the strongest predictors of approval was the phase 2 accrual relative to the disease average, and that prior approval for any indication, past approvals of other drugs for similar indications, and well-established mechanisms of action all improved the odds of approval. Strong indicators of failure, according to the team’s model, were whether a drug targeted a therapeutic area that has historically demonstrated a much lower probability of success in clinical development (e.g., cancer or Alzheimer’s disease), trial termination, poor patient enrollment, and the absence of an international nonproprietary name for a drug.

The challenge drew more than 300 individuals, who submitted approximately 3,000 models over the two-month submission period. In addition to predictive power, the submissions were evaluated in a head-to-head competition based on their innovativeness and robustness, as well as the potential business value of their findings. Ultimately, two teams developed models that outperformed the baseline MIT model along all metrics—the winning team with biostatistics and drug development expertise, and the runner-up team with bio- and cheminformatics expertise from the Genomics Institute of the Novartis Research Foundation.

“All stakeholders are affected by the risk of drug development, so we were excited to have an opportunity to work with Novartis to better understand how artificial intelligence can be combined with human intelligence to lower this risk, as well as to lower the cost of capital to the biopharma industry,” said Siah.“The DSAI challenge highlights the promise of crowdsourcing in developing new predictive models, as well as the opportunity to develop more accurate models with additional data and a broader pool of challenge participants,” added Prof. Lo. “We hope our experience can serve as a template for other universities and biopharma companies to collaborate on their own challenges.”

This research is part of the MIT LFE’s Project ALPHA (Analytics for Life-sciences Professionals and Healthcare Advocates), which leverages industry datasets such as Informa Pharma Intelligence’s Citeline suite of solutions to provide more timely and accurate estimates of the risks and rewards of clinical trials to the entire biopharma ecosystem. The ultimate goal of the project is to help patients by developing analytics that allow all biomedical stakeholders to better manage the tremendous risks of drug development.

About the MIT Laboratory for Financial Engineering

The MIT Laboratory for Financial Engineering (LFE) is a research center focused on the quantitative analysis of financial markets and institutions using mathematical, statistical, and computational models and methods. The goal of the LFE is to support and promote academic advances in financial engineering and computational finance that can be directly applied for the betterment of the world. To do that, LFE faculty, students, and staff engage with industry professionals, regulators, policymakers, and other stakeholders to develop and apply new financial technologies to practical and socially important settings.


About Informa Pharma Intelligence and Citeline

Informa Pharma Intelligence delivers data to key decision-makers across the pharmaceutical and biomedical industries in order to create real-world opportunities for growth. Citeline is part of Informa’s Pharma Intelligence vertical and is a comprehensive source of real-time R&D intelligence for the pharmaceutical industry. For more information, visit

Related Articles