Credit: Rob Dobi
Ideas Made to Matter
Study: Industry now dominates AI research
Until the early 2000s, AI research in academia and in industry was split fairly evenly. But over the past decade, the balance has shifted significantly. Industry has the upper hand when it comes to computing power and access to data, which makes it easier for businesses to hire talent, develop industry-leading AI benchmarks, and continue to invest in research.
Industry is now taking the lead on — and influencing the direction of — basic AI research that has traditionally been the domain of academia, according to a new paper in the journal Science that was co-authored by MIT research scientist Neil Thompson, MIT postdoctoral associate Nur Ahmed, and Virginia Tech PhD student Muntasir Wahed. This trend raises concerns about the future of AI research that is in the public interest but may not be profitable.
“Deep learning is the form of AI that has powered the [AI] revolution of the last 10 years, and underlying that success has been an incredible allocation of resources,” Thompson said. “You might worry that academics would get priced out in this situation, and our research is saying that’s in fact happening.”
A concentration of resources and influence
Like the research process itself, the dominance of industry in AI research can be explained through inputs and outputs. In this case, the inputs are data, researchers working in the field, and accessible computing resources, while the outputs are the AI models and their quality.
Today, roughly 70% of individuals with a PhD in artificial intelligence get jobs in private industry, compared with 20% two decades ago.
Consider the inputs. Businesses have access to large data sets because their operations naturally produce a lot of data through interactions with users and devices. Talent-wise, roughly 70% of individuals with a PhD in artificial intelligence get jobs in private industry today, compared with 20% two decades ago. Since 2006, the number of research faculty members in academia has remained roughly flat, while hiring in industry has risen eightfold. Finally, industry models are 29 times larger on average than those in academia, which highlights the difference in computing power that is available to the two groups.
On the output side, the largest AI models developed in any given year now come from industry 96% of the time. (Thompson and Ahmed determined size based on the number of parameters.) Leading benchmarks, or models used to measure progress in different areas of AI, come from industry 91% of the time, while the number of published papers with industry co-authors has nearly doubled since 2000.
This is a stark contrast to industries such as pharmaceuticals, where there’s a roughly even split between academic and industry research. In that field, Thompson said, academia is well positioned to conduct basic research — identifying new drug targets, for example — at a reasonable cost, while industry has the resources to conduct applied research such as clinical trials. This natural division of labor hasn’t hindered the research process.
The challenge with AI is that “you’re not looking at a single module; you can’t break it apart in an easy way,” Thompson said. To create new versions of ChatGPT, for example, new functionality must be built on top of the existing model in its entirety.
Academic researchers simply lack the resources to do this work, both because industry owns the models and because the price tag for computing power is too high, the researchers said. In 2021, U.S. government agencies, aside from the Department of Defense, allocated $1.5 billion for academic funding for AI research. That’s the same amount a single company (Google) spent on a single AI research project (DeepMind) in a single year (2019).
In an effort to promote “responsible American innovation in artificial intelligence,” on May 4 the U.S. government announced $140 million in funding from the National Science Foundation to launch seven National AI Research Institutes.
Concerns about the future of AI research
Thompson and Ahmed expressed several concerns about the future of AI research if academia continues to have a limited role.
The continued flow of talent to industry is troubling, Ahmed said. For one thing, it leaves fewer academic researchers to train the next generation. It also means that benchmarks set by industry increasingly shape the overall research agenda for AI.
“If industry benchmarks win, then postdoctoral work is more likely to follow industry’s lead instead of thinking about things differently,” Ahmed said.
A research agenda driven by private industry could push to the sidelines work that’s in the public interest but not particularly profitable. This includes conducting research on topics like public health and ensuring that AI models are unbiased, equitable, and used for the public good. (This concern is part of what motivated technology leaders to call for a moratorium on AI development in order to create safeguards for society in general and consumers in particular.)
“If you want to audit an industry’s models for fairness, one way to do that is to create a different model to compare it to,” Thompson said. “Under those circumstances, you need resources.”
At a global level, the growing divide between academia and industry means that research is increasingly concentrated where technology firms are most capable of developing advanced systems. In today’s economy, that’s the United States and China — the former benefiting in part from an influx of talent from other countries, the latter from the rapid rise of data-rich platforms such as WhatsApp — and, to a lesser extent, Canada.
Europe, on the other hand, runs the risk of falling further behind, though previous papers Thompson has co-authored have indicated that this phenomenon isn’t unique to AI and applies more broadly to supercomputing.
“Europe has the talent, but there’s less incentive to do the research there,” Ahmed said.
Finding the right balance
Industry has played a role in important AI research, Thompson and Ahmed noted. Machine learning models such as PyTorch and TensorFlow were initially developed by companies (Meta and Google, respectively) and are now open source and widely used in academia.
“Industry being able to do AI research better isn’t something to vilify. It means cheaper and better products and services,” Thompson said. “There are lots of advantages the world gets.”
The argument isn’t that a set percentage of work should be done in academic settings, the researchers write. Instead, it’s about shifting the balance so academia has the capabilities to conduct AI research that’s aligned with the public interest.
Ahmed said international collaboration could help — something similar to CERN, the European Organization for Nuclear Research. Such an effort would allow for the creation of more representative AI models that are trained on larger and more diverse data sets, which would have the spillover effect of encouraging more research.
“There are talented people across the globe who are looking for opportunities but lack mentorship,” Ahmed said. “This would be a way to unlock talent and increase diversity in research.”
Why it’s time for data-centric artificial intelligence