Artificial Intelligence

5 ‘heavy lifts’ of deploying AI agents

Dylan Walsh

Feb 24, 2026 5 minute read

Research shows that the hardest work in deploying agentic AI in a clinical setting is the “sociotechnical aspects” — rather than tasks like prompt engineering.
For every hour spent perfecting a model, organizations should expect roughly four hours of implementation work.
Researchers distilled their findings into five challenges for agentic AI deployment in any sector.

Agentic artificial intelligence is a promising approach for analyzing health data to improve patient outcomes, but new research shows a gap between where organizations focus their attention and where success with AI agents is determined.

Researchers from 10 institutions studied the deployment of an agentic AI system that looks for adverse events among cancer patients receiving immunotherapy treatment. Studying how the agent was practically implemented, the researchers, including an MIT Sloan professor and one of the article’s lead authors, found that success requires focusing more on the “sociotechnical” aspects of implementation and infrastructure instead of more expected tasks such as prompt engineering.

The research provides insights for using AI agents in clinical settings and in broader applications.

Building an AI agent for a clinical setting

Agentic AI describes a system of AI agents designed to independently complete multistep processes.

“Agentic workflows allow models to take on more autonomous and oftentimes more challenging tasks, which has a lot more potential benefit for organizations,” said Danielle Bitterman, an assistant professor at Harvard Medical School, clinical lead for data science and AI at Mass General Brigham and one of the article’s lead authors. “But it creates new complications and considerations, because the risk is higher if you’re moving humans a little further out of the loop.”

The new study focused on immunotherapy, a cancer treatment that can cause a broad spectrum of adverse events in patients. These events are often difficult to detect, in part because the patients are already quite sick, and also because the information that identifies such an event is buried in lengthy, unstructured electronic medical records.

Agentic AI, explained

4 new studies about agentic AI from MIT

3 ways AI helps to empower health care clinicians

The researchers trained AI agents to scan patients’ records to determine whether they were experiencing adverse events and the severity of each event. The AI creates reports based on this information and sends them to clinicians and clinical research coordinators for review.

Speed is a key benefit of the agentic system, which can process hundreds of notes in minutes instead of hours to days, Kellogg said.

“Our system is also able to identify adverse events just as accurately and more consistently than standard processes, which have clinical research coordinators reviewing charts,” she said.

The agentic system is scalable and helps care providers detect adverse events as soon as they appear in unstructured clinical notes, which are processed daily, without having to wait for episodic human review.

5 “heavy lifts” for implementing agentic AI systems

While algorithms and models often get the most attention, the researchers found that infrastructure and implementation were the most challenging aspects of using the AI agents. In fact, less than 20% of effort behind deploying the system ended up being dedicated to prompt engineering and model development. More than 80% of the effort was consumed by the sociotechnical work of implementing the system.

“For every hour spent perfecting a model, expect roughly four hours to make it work in the real world,” the researchers write.

The researchers distilled that work into five “heavy lifts” that are necessary for success in any setting. As Bitterman noted, “These five topics might not be surprising to people already working in AI, but agentic AI raises new challenges within each one.”

Data integration: Agentic AI can be pictured as an assembly line in which pieces of data are moving along the belt, Bitterman said. If the shape of the part is wrong — that is, if the data is not labeled and accessible, or its use is not clearly defined — then the machine jams. Given that agentic systems are ingesting dynamic data in real time, this process of integration becomes especially important.

“To be ready for agentic deployment, you need consistent data pipelines and serving infrastructure already in place,” said Jack Gallifant, the lead author of the paper and now an AI engineer at Phare Health. “This is nontrivial. It takes far more effort than most people expect, and is probably the most underappreciated part of the story. ”

Model validation: Validating agentic AI models requires confirming not only that the output is what you want it to be but also that the agents are behaving as they should and following the rules as described. “We defined clear policies for the agents’ actions upfront, and we maintain audit logs for all steps to make sure that only the allowed tools and databases are being accessed,” Bitterman said.

Ensuring economic value: Calculating return on investment for agentic AI solutions is not straightforward. Because agentic workflows are dynamic, costs are variable. One process may be more challenging than the next, triggering more reasoning from the model; more collaboration between agents could also lead to greater expense. To tackle this, the researchers empirically explored different possibilities at the outset, establishing a range of potential costs.

Monitoring for model or data drift: Some AI solutions can be monitored using predefined if-then rules and static thresholds. “But with agentic AI, these systems are reasoning, planning, and acting independently across multiple steps,” Kellogg said. “So we need to do what we call ‘adaptive monitoring,’ which is basically continuously tracking multiple dynamic metrics.”

In this case, the researchers designed a monitoring framework to look at whether the model or the data inputs were drifting from expectation, and whether their surveillance systems were responding to changes in the AI system’s behavior.

Governance: To manage governance concerns, the researchers clarified risks at every point in the agentic process: Is every step that the system takes fully secure and legal? What key decisions is the model making, and who should be informed about them? Who is responsible if something goes wrong? “We looked at the system’s entire workflow and were very careful about accountability along the way,” Kellogg said.

Expanding beyond clinical settings

Kellogg noted that the use of agentic AI in health care is particularly high stakes, which means that each of the five considerations required careful attention. “In other areas, like retail, we might see a lighter touch with these things we’re calling heavy lifts,” she said.

But there’s no doubt that organizations in any sector that choose to deploy agentic AI need playbooks that speak to each of these five topics, Kellogg said.

“The hardest work isn’t in deploying the model or writing smarter algorithms, but transforming the organization to support these things,” she said.

Read the paper: “A Field Guide to Deploying AI Agents in Clinical PracticE”

This article is based on the research paper and a discussion with the three lead authors.

Kate Kellogg is the David J. McGrath Jr. Professor of Management and Innovation at the MIT Sloan School of Management. Her research focuses on helping knowledge workers and organizations develop and implement predictive and generative AI products to improve decision-making, collaboration, and learning.

Danielle Bitterman is an assistant professor of radiation oncology at Harvard Medical School and clinical lead for data science/AI at Mass General Brigham Digital. She is a physician-scientist whose research specializes in AI analysis of clinical data to transform medical research and clinical care, and AI evaluation and monitoring for safe and sustainable translation of AI into clinical settings.

Jack Gallifant is an AI engineer at Phare Health and a former post-doctoral fellow at Harvard Medical School, where he conducted the research. He is focused on building frontier AI agents for health care.