Credit: Stephen Sauer

To help improve the accuracy of generative AI, add speed bumps

Beth Stackpole

Nov 13, 2024

For all the enthusiasm over generative artificial intelligence, there’s legitimate concern about the potential for bias or inaccuracies, even with some level of human intervention.

These concerns amplify as the technology becomes more accessible and use cases proliferate. In one instance, a 2023 analysis of more than 5,000 images produced by Stable Diffusion based on job title- and crime-related prompts found that the generative AI tool significantly amplified gender and racial stereotypes. More recently, OpenAI’s ChatGPT and Microsoft’s Copilot chatbots have been found to produce fabricated data (sometimes referred to as hallucinations) that appears to be authentic.

Keeping a human in the loop is one widely touted approach to overseeing AI in hopes of maintaining trust and mitigating risk. But most people aren’t as good at recognizing errors as they think they are and tend to anchor on AI-generated content even if they know there’s a possibility of error, according to MIT Sloan senior lecturer and research scientist

“It’s hard to put the genie back in the bottle,” said Gosline, the Human-First AI group research lead at the MIT Initiative on the Digital Economy. But “when you’re talking about trillions of dollars being invested, the potential impact on people’s livelihoods, and the scale and proliferation of potential error and bias, it’s a motivator to take a hard look at what’s happening.”

Typically, people designing digital experiences are hyperfixated on reducing friction to make it easier to do things. Gosline teamed up with a group at Accenture to pressure-test her theory of auditing AI systems for touch points where introducing the right kind of “targeted friction,” or beneficial friction — described as cognitive and procedural speed bumps — in AI workflows can improve overall accuracy and reduce uncritical adoption.

The researchers found that friction should not be universally viewed as bad in the context of AI but rather can serve as a deliberate tool for promoting more responsible and successful generative AI use.

“Friction is a more thoughtful approach to moving beyond pilot programs and getting the value and scale you expect from the adoption of the technology,” said Arnab Chakraborty, Accenture’s chief responsible AI officer.

A test case for beneficial friction

In MIT and Accenture’s experimental use case, targeted friction was added to large language model outputs to interrupt the automatic nature of AI-human engagement. The goal was to encourage users to partake in a more conscious and deliberate approach to cognitive processing, known as System 2 thinking, when performing generative AI-enabled tasks, without dramatically slowing or upending the end-to-end process.

Study participants were asked to use generative AI to create a pair of executive summaries of company profiles within a 70-hour time frame, including referencing available sources. The participants used an experimental tool designed to augment people’s use of generative AI by nudging users’ attention toward potential errors and omissions in LLM content. The tool employed highlighting to convey different information based on the color, thus requiring users to do some cognitive processing rather than uncritically adopting the generative AI output.

Purple highlighted text matched terms used in the prompt as well as internal databases and public information. Orange highlighted text indicated potentially untrue statements that should be considered for removal or replacement. Text included in the prompt but omitted from the output appeared in blue below the generated response.

Participants were randomly assigned to one of three conditions, each of which injected varying levels of speed bumps in the form of highlighting.

The full-friction condition imposed all three kinds of highlighting on the generated content.
The medium-friction condition contained two kinds of highlighting on the generated content.
The no-friction control condition contained no highlighting at all, reflecting the current generative AI user experience.

The researchers then analyzed the results, comparing the number of inaccuracies and omissions users found in the AI-generated text and the time spent on the task.

The researchers found that the medium-friction condition pushed users to more carefully scrutinize generated text to catch inaccuracies and omissions, without being a significant drag on the time it took to complete a task. Introducing moderate levels of friction in the form of two kinds of highlighting created an optimal balance between accuracy and efficiency, the researchers concluded.

“AI tools allow us to take protracted System 2 processes, like writing and editing, and turn them into System 1 processes that are super-fast and intuitive,” Gosline said, noting that this change can lead to errors. “We wanted to push back on the idea that AI should be used to turn everything into System 1 processes. We want to use models to shave time off work, but we don’t want to leave users open to risk.”

Putting theory into action

Beyond its experimental collaboration with MIT, Accenture is putting the concept of beneficial friction into action with its own AI-related business processes. The firm takes a very deliberate, risk-based approach to AI, Chakraborty said. When an AI project owner begins an AI-related job, they are required to answer four questions to help determine the risk level of the effort. Guidelines and best practices are provided, and projects flagged as higher risk automatically trigger additional processes and oversight steps designed to assess potential issues and challenges along the way.

“We created this as part of our governance processes and cultural enablement,” Chakraborty said. “It creates a level of trust and confidence in Accenture systems and for our clients. It also shows that friction has an overall net-positive benefit.”

Gosline and Accenture leaders have the following recommendations for organizations looking to create more guardrails and governance to rein in unfettered use of generative AI, including making use of beneficial friction:

Evaluate organizational readiness and maturity. Organizations need to first understand their maturity level when it comes to responsible AI, including their ability to comply with standards and regulations. That lens will inform what kind of processes need to be implemented before large-scale AI implementation.

Assess AI system risk. Not all AI systems require the same level of control. When using tools like targeted friction, it’s important to tailor speed bumps to when and where it’s necessary in the context of overall risk.

Embrace systematic, structured enablement. Individual solutions won’t thoroughly address accuracy and bias concerns, because users may overestimate their ability to identify AI-generated errors. “Much of the conversation now has evolved into putting humans in the loop to solve the problems of inaccuracy and bias, but when it comes to generative AI, we’re finding this is not enough,” Gosline said. “Though users benefited from speed bumps, they were not more likely to self-report that the speed bumps helped them be more accurate. This suggests an overconfidence bias, where users may think they are more able to detect AI-generated errors than they are.

“Beware of individual-level solutions for structural problems,” she added. “Look at structural, systemic solutions like adding beneficial friction to use as a tool or business process.”

Encourage a culture of experimentation. Before AI tools and models are deployed, test how workers interact with them, including any possible impacts on accuracy, speed, and trust. Experimentation provides key insights into how to elevate the role of employees in human-in-the-loop systems, including when the application of targeted friction makes the most sense.

Study gauges how people perceive AI-created content

3 ways to center humans in artificial intelligence efforts

Human-centered AI fights bias in machines and people

Operationalize continuous monitoring. AI models are dynamic systems, and once they’re in production, data and outputs can drift, causing inaccuracies and hallucinations over time. Oversight and monitoring systems need to be in place to constantly evaluate systems, identify potential incidents and problems, and create and orchestrate the right interventions.

Education and training are key. As AI use escalates, workers need to be brought along, especially since the technology is changing so rapidly. When it comes to generative AI, education on the role and implementation of prompt engineering is particularly important because it’s a prime area for potential bias, Gosline said. “One of the most important points for friction is at the generation of a prompt,” she said. “To address bias, we want users to be deliberately and consciously thinking about what they’re trying to accomplish and what they’re using the output for.”

Beneficial friction isn’t the only remedy for reducing AI inaccuracies and bias. Gosline encourages organizations to test and learn, expanding the net of experimentation as they ramp up and expand the number of AI use cases. She also cautions that AI doesn’t have to be extremely easy to use for people to deem it useful — with the highlights and labels used in the beneficial friction experiment, users still saw benefits and organizations still had a clear path to achieving ROI.

“There are going to be failures and bumps along the way,” she said. “But beneficial friction is a far superior way of deploying AI than cleaning up a mess or creating societal shifts because you put an unchecked model out there at scale and it caused bias.”