Credit: Mimi Phan / Julian Hochgesang

This new forecasting model is better than machine learning, researchers say

Betsy Vereckey

Sep 19, 2023

What if we told you there was a new financial forecasting model accurate enough to predict the outcome of the next U.S. presidential election or next season’s NBA draft prospects?

The approach — relevance-based prediction — relies on a mathematical measure to account for unusualness. In classical statistics, “we’re told to be skeptical of outliers” because they could be data errors, said senior lecturer in finance at MIT Sloan and a co-author of a paper outlining the approach.

But there’s a lot to be gained from emphasizing observations that are different from average, he said.

“Unusual events contain more information than common events,” said Kritzman, president and CEO of Windham Capital Management. “It’s almost like a controlled science experiment — when something really strange happens, that’s when relationships shine through.”

Relevance-based prediction originated in Kritzman’s research efforts in the late 1990s, when he was looking for ways to account for risk when constructing investment portfolios.

He and his colleagues developed a mathematical equation to measure unusualness in financial returns, then realized much later that they had independently arrived at a mathematical principle that already existed — the Mahalanobis distance, first used to analyze resemblances among human skulls in India. Kritzman believes they were the first to apply it to finance.

“We used it initially to measure market turbulence, and we found that it was very helpful,” he said.

In subsequent years, Kritzman and his colleagues found more ways to apply the Mahalanobis distance in finance. Then, during the pandemic, the researchers began exploring how it could be used to form predictions when there are complex relationships that traditional linear regression, another common forecasting method, is not great at handling.

“We kept playing with the math and got data and tested it across different settings,” Kritzman said.

The results of this exploration are summarized in “Relevance-Based Prediction: A Transparent and Adaptive Alternative to Machine Learning,” co-authored by Megan Czasonis and David Turkington of State Street Associates.

Better than machine learning and statistics

When it comes to forecasting, many people think of machine learning, which relies on vast quantities of data to detect patterns and make accurate predictions about the future.

What key factors impact investors' willingness to pay for data?

Retail investors lose big in options markets, research shows

Sustainable investing: 4 questions to ask

Machine learning is useful; it can help in manufacturing by detecting defects in products and in health care by improving the diagnosis of diseases. But it’s not perfect.

“We know that machine learning works, but it has its own baggage,” said Kritzman. Model-based machine learning is not grounded in theory, for example, and is not capable of adapting to new, unusual circumstances.

“Model-based machine learning looks at historical data to form a prediction, but if circumstances change in the future — if something unprecedented occurs — then that model is no longer good, and you have to start all over again,” Kritzman said.

Linear regression is still the dominant approach today for most simple prediction tasks, but it lacks transparency. It can’t provide insight on how an observation or an event informs a prediction. It also doesn’t provide information about the quality of a prediction.

The relevance-based prediction method, in contrast, solves for all of these factors and is a more reliable method of prediction because it:

Uses the Mahalanobis distance to mathematically demonstrate how important an observation is to a prediction by measuring both unusualness and similarity.

Tells statisticians how much confidence they should attach to a specific prediction.

Relies on a subsample of relevant observations to form predictions and tells researchers how big the sample should be in order to be optimal.

“What we saw is that if we could discard the nonrelevant observations and just use a subset of relevant observations, we could generate more reliable predictions,” Kritzman said.

Real-world applications of relevance-based prediction

So far, Kritzman and his co-authors have used relevance-based prediction to show how to predict stock-bond correlations and how to predict factor returns in the stock market. In the latter scenario, for example, the authors found that more data isn’t always better, even though it’s long been assumed that larger samples produce more reliable predictions.

In an earlier paper, “Addition by Subtraction: A Better Way to Forecast Factor Returns and Everything Else,” the authors used relevance-based prediction to determine how informative and similar a historical observation is to what’s currently being measured.

In doing so, they found that using the most relevant data points can better predict factor returns over traditional regression analysis, not unlike the way economists extrapolate from relevant historical events when doing their forecasting. Crafting a subset of data from the most relevant data points is what improves a model’s predictive power.

They’ve also applied it in politics. Their research correctly predicted the winner of the past six U.S. presidential elections based on the political, geopolitical, and economic circumstances of the election year, using a sample that included 35 presidents from 1876 to 2020.

“We wrote the paper prior to 2020, but we got 2016 correct, which nobody [else] got correct,” Kritzman said.

In addition, Kritzman is focusing on applying relevance-based prediction to sports. “We think that this would be very helpful to professional teams in predicting future outcomes of players they’re considering drafting,” he said, noting that a “massive amount of data” on NBA and college basketball players is continuously being accumulated.

Kritzman said that relevance-based prediction would also be helpful in predicting different lineups and strategies during a game. “It’s amazing how much data is being collected, but I don’t think [teams] have very sophisticated approaches for generating insights from that data,” Kritzman said. “That’s where I think relevance-based prediction could come in.”