There’s a lot you can ask generative artificial intelligence to do: compile an agenda for a meeting, write an email, transcribe notes, generate computer code. The harder question to answer is, should you?
“It’s not a trivial task, learning how to work well with a machine,” said an MIT Sloan associate professor who specializes in AI labor and online marketplace research. “There’s still this task of figuring out how to ask good questions or how to make good requests.”
Since last fall’s release of ChatGPT — the powerful AI tool that can answer questions, chat with humans, and generate text — businesses and consumers alike have been experimenting with generative AI. Whether they want to save money or increase productivity, there’s a lot to consider. What questions should employers ask before swapping out human labor with AI?
“When we are thinking about ‘Should AI do a task?’ it’s not really a question of ‘Could an AI do a task?’” Horton said. “It’s whether the process of combining AI with human capabilities is worth the effort.”
A lot has to go right for a human-AI interaction to be worthwhile. Humans have to ask the right questions and be able to evaluate the quality of the system’s answers in a timely manner. “Is that going to be more efficient than just having the person do the task directly?” Horton said.
How much time does the task require without AI assistance?
AI is often lauded for its ability to do tasks in a fraction of the amount of time that a human takes. Therefore, “if you have a task that takes a lot of time normally, that’s going to be a candidate that’s ripe for substitution,” Horton said.
What’s more, the tasks AI excels at are common across “lots of jobs,” he said, meaning there’s ample opportunity for it to be used. “That might be writing an agenda for a meeting or writing some software that builds a web app that does X, Y, and Z,” Horton said. “It’s important to note how big of an impact generative AI is going to have,” though this will depend on how many tasks it can do, how common they are and how important those are in labor markets.
How highly paid are the people who perform this task?
Done right, artificial intelligence can save businesses money. In fact, a recent report from McKinsey estimates that generative AI’s impact on productivity could add $2.6 trillion to $4.4 trillion annually to the global economy.
If AI can be used to replace a task that’s done by a highly paid person, “this is obviously a place where substitution would be more attractive,” Horton said. “If you think about where we might see a lot of R&D and entrepreneurial focus, I think it’ll be tasks that tend to be common across a lot of jobs done by highly paid people.”
Once they’re automated, these expensive tasks can then be done by a broader range of employees, he said.
How capable is the AI of completing the task correctly?
“The capability of the AI matters a lot,” Horton said. “A lot of the excitement about what’s happening now is a lot of things that we thought would be very, very hard for AI to do now seem to be much more feasible than we had thought.”
That said, when swapping in AI, a human still has to ask the right questions to get the right result, and that’s not always easy. “There’s a bit of an art to writing generative AI prompts,” Horton said.
Horton shared an anecdote where he tried to use ChatGPT to program in a language called Perl, in which he was a total novice. ChatGPT failed to do what he was asking on the first try, but he tried again, modifying his question. This time, “it nails it,” demonstrating that when you ask the right question, you overcome one of the biggest hurdles, he said.
“You can get the right outcome even without knowing how to do a task yourself, which is something pretty new and exciting,” Horton said.
How easy is it for humans to determine whether the AI output is accurate?
Being able to quickly evaluate results from AI is crucial, and it’s sometimes easily done — for example, when you’ve used AI to create an agenda for a meeting.
Other times, “you could imagine tasks where it’s very hard to know the results are actually acceptable,” Horton said. “I would have to do just as much work confirming that it worked,” he said, noting that this “evaluation cost” is hugely important.
If you’re a programmer who uses AI to write code, for example, you still need to do the administrative step of copying and pasting the code to test that it works. If that latter step takes too long or is too difficult for a human to do, then AI isn’t worth doing.
More broadly, it will take a considerable amount of time to learn how good ChatGPT actually is at particular tasks. You can use it to draft a will, for example, but you won’t know until you or your loved one dies how thorough a job it has done. “For the time being, it’s probably better to stick with a human lawyer,” Horton said.
In the future, Horton sees AI technology improving not just through the models themselves but in two additional ways: prompting people to ask AI right questions via a more user-friendly interface, and evaluating results with methods that augment human judgment so that people can quickly tell that it meets their needs.
In particular, “it’s not hard to imagine that kind of thing becoming more integrated to where the evaluation step is a lot simpler and self-automated,” Horton said. “Even these two prompting and evaluation tasks are themselves tasks that artificial intelligence could potentially augment.”