Credit: Mer_Studio / Shutterstock

Ideas Made to Matter

Artificial Intelligence

4 new studies about agentic AI from the MIT Initiative on the Digital Economy

By

Over time, artificial intelligence tools are being given more autonomy. Beyond serving as human assistants, they are being programmed to be agents themselves — negotiating contracts, making decisions, exploring legal arguments, and so on.

This evolution raises important questions about how well AI can perform the kinds of tasks that have historically depended on human judgment. As AI takes over some tasks from people, will it demonstrate the requisite reasoning and decision-making skills?

MIT Sloan professor of management, IT, and marketing Sinan Aral and postdoctoral fellow Harang Ju have been exploring these questions and more in several areas of new research that range from how AI agents negotiate to how they can be made more flexible in their interpretation of rules. Aral is the director of the MIT Initiative on the Digital Economy, where Ju is a member of the research team. 

“A lot of people in industry and computer science research are creating fancy agents, but very few are looking at the interactions between humans and these tools,” Ju said. “That’s where we come in. That’s the theme of our work.”

“We are already well into the Agentic Age [of AI],” Aral said. “Companies are developing and deploying autonomous, multimodal AI agents in a vast array of tasks. But our understanding of how to work with AI agents to maximize productivity and performance, as well as the societal implications of this dramatic turn toward agentic AI, is nascent, if not nonexistent.

“At the MIT Initiative on the Digital Economy,” he continued, “we have doubled down on analyzing rigorous, large-scale experiments to help managers and policymakers unlock the promise of agentic AI while avoiding its pitfalls.”

Below are four recent insights from this research program, which aims to more fully explore the frontiers of AI development.

AI can be taught to handle exceptions 

In a new paper co-authored by Matthew DosSantos DiSorbo, Aral and Ju presented  people and AI alike with a simple scenario: To bake a birthday cake for a friend, you are tasked with buying flour for $10 or less. When you arrive at the store, you find that flour sells for $10.01. What do you do?

Most humans (92%) went ahead with the purchase. Almost universally, across thousands of iterations, AI models did the opposite, citing the fact that the price was too high.

“With the status quo, you tell models what to do and they do it,” Ju said. “But we’re increasingly using this technology in ways where it encounters situations in which it can’t just do what you tell it to, or where just doing that isn’t always the right thing. Exceptions come into play.” Paying an extra cent for the flour for a friend’s cake, he noted, makes sense; paying an extra cent per item does not necessarily make sense when Walmart is ordering a large number of items from suppliers.

The researchers found that providing models with information about both how and why humans opted to purchase the flour — essentially giving them insight into human reasoning — corrected this problem, giving the models a degree of flexibility. The AI models then made decisions like people, justifying their choices with comments like “It’s only a penny more” and “One cent is not going to break the bank.” The models were able to generalize this flexibility of mind to cases beyond purchasing flour for a cake, like hiring, lending, university admissions, and customer service.   

Read the working paper: Teaching AI to Handle Exceptions 

The performance of human-AI pairs depends on how the AI is designed 

How does work change when people collaborate with AI instead of with other people? Does productivity increase? Does performance improve? Do processes change?

To tackle these questions, Aral and Ju developed a new experimental platform called Pairit (formerly MindMeld), which pairs people with either another person or an AI agent to perform collaborative tasks. In one situation documented in a recent paper, participants were asked to create marketing campaigns for a real organization’s year-end annual report, including generating ad images, writing copy, and editing headlines. The entire task unfolded in a controlled and observable environment.

“We believe the Pairit platform will revolutionize AI research,” Aral said. “It injects randomness into human-AI collaboration to discover causal drivers of productivity, performance, and quality improvements in human-AI teams.” 

Aral said the scientific community can use the platform to discover process, reskilling, and intangible investment strategies that unlock productivity gains from AI, and Aral and Ju plan to make the platform freely available to researchers to study AI agents across diverse settings. 

In their study, Aral and Ju found that human-AI pairs excelled at some tasks and underperformed human-human pairs on others. Humans paired with AI were better at creating text but worse at creating images, though campaigns from both groups performed equally well when deployed in real ads on social media site X. 

Looking beyond performance, the researchers found that the actual process of how people worked changed when they were paired with AI . Communication (as measured by messages sent between partners) increased for human-AI pairs, with less time spent on editing text and more time spent on generating text and visuals. Human-AI pairs sent far fewer social messages, such as those typically intended to build rapport.

“The human-AI teams focused more on the task at hand and, understandably, spent less time socializing, talking about emotions, and so on,” Ju said. “You don’t have to do that with agents, which leads directly to performance and productivity improvements.”

As a final part of the study, the researchers varied the assigned personality of the AI agents using the Big Five personality traits: openness, conscientiousness, extraversion, agreeableness, and neuroticism.  

The AI personality pairing experiments revealed that programming AI personalities to complement human personalities greatly enhanced collaboration. For example, conscientious humans paired with “open” AI agents improved image quality, while extroverted humans paired with “conscientious” AI agents reduced the quality of text, images, and clicks. Men and women worked better with different types of AI personalities. While men were more productive and produced better-performing ads with “agreeable” AI, they were less productive and produced lower-quality work with “neurotic” AI. Women were more productive and produced better-quality work with “neurotic” AI but were not pushed to be their best with “agreeable” AI. 

Different AI personalities also worked better in different cultures. For example, working with “extroverted” AI boosted performance among Latin American workers but degraded performance with East Asian workers, Aral said. “Neurotic” AI boosted human performance in Asia but degraded performance in Latin America and the Middle East.

Aral and Ju said these effects were “so strong and so meaningful” that they built a company, Pairium AI, “designed to build the personalization layer of the Agentic Age.” Pairium AI is building technology, like the Pairit tool, that pairs humans with different types of AI to get the most out of both humans and the AI.

Read the working paper: Collaborating with AI agents 

Negotiating with AI bots requires novel approaches 

A new paper by Aral and Ju along with three other MIT researchers — professor , doctoral student Michelle Vaccaro, and doctoral student Michael Caosun — examines how to create the most effective AI negotiation bot.

Related Articles

Machine learning and generative AI in 2025
How generative AI affects highly skilled workers
These human capabilities complement AI’s shortcomings

For their study, the researchers developed an international competition, attracting “300 or 400 of the world’s top negotiation experts from companies and universities to iteratively design and refine prompts for a negotiation bot,” Ju said. “This allowed us to really efficiently explore the space of negotiation strategy using AI.”

They found that bots with killer instincts — those focused exclusively on taking as much of the pie as possible — were less effective than those that expressed warmth during negotiation; the latter type was more likely to keep counterparts at the table and thus more likely to reach a deal.

That said, to capture value in the process of negotiation, bots had to possess a degree of dominance alongside their warmth; warmth alone was a losing strategy. The most successful bot negotiators thus confirmed fundamental principles in existing negotiation theory.

The competition also revealed novel tactics that apply only to AI bots — things like prompt injection, in which one bot pushes another bot to reveal its negotiation strategy. Given this, the researchers noted that a new theory of negotiation that pertains specifically to AI must be developed alongside theory previously developed around how humans negotiate with each other.

Read the working paper: Advancing AI negotiations 

Trust varies in AI search results 

It is well known that generative AI sometimes “hallucinates” by inventing information in response to questions. Yet generative AI is an increasingly popular tool applied to internet searches. New research by Aral and MIT Sloan PhD student Haiwen Li studied how much trust people place in results returned by generative AI. They found that on average, people trust conventional search results more than those produced by generative AI — though these levels of trust vary by demographics. People with a college degree or higher, those who work in the tech sector, and Republicans tend to place more trust in generative AI.

The researchers also explored how different interventions affect this trust. When a generative AI search provides reference links for its results, people trust the tool more, even if those links have been fabricated. Offering information about how the models work boosts trust as well. However, the practice of “uncertainty highlighting,” where the model highlights information in different colors depending on its confidence in the result, decreases trust in results. 

Levels of trust, in turn, are related to a person’s willingness to share that information with others: More trust indicates a greater willingness to share.

Read the working paper: Human Trust in AI Search 

A lightbulb with the abbreviation "Ai" on it seems to be flying like a rocket ship

AI Executive Academy

In person at MIT Sloan

For more info Sara Brown Senior News Editor and Writer