Credit: Rob Dobi
Ideas Made to Matter
In March 2019, the CEO of a U.K-based energy firm listened over the phone as his boss — the leader of the firm’s German parent company — ordered the transfer of €220,000 to a supplier in Hungary.
News reports would later detail that the CEO recognized the “slight German accent and the melody” of his chief’s voice and followed the order to transfer the money [equivalent to about $243,000] within an hour. The caller tried several other times to get a second round of money, but by then the U.K. executive had grown suspicious and did not make any more transfers.
The €220,000 was moved to Mexico and channeled to other accounts, and the energy firm — which was not identified — reported the incident to its insurance company, Euler Hermes Group SA. An official with Euler Hermes said the thieves used artificial intelligence to create a deepfake of the German executive’s voice, though reports have since questioned the lack of supporting evidence.
What’s for certain, however, is that the technology for this type of crime does exist, and it’s only a matter of when the next attack will happen and who will be the target.
“It’s a time to be more wary,” said Halsey Burgund, a fellow in the MIT Open Documentary Lab. “One should think of everything one puts out on the internet freely as potential training data for somebody to do something with.”
The manipulation of data is not new. Ancient Romans chiseled names and portraits off stone, permanently deleting a person’s identity and history. Soviet leader Joseph Stalin used censorship and image editing to control his persona and government in the early-mid 20th century. The advent of the computer age meant a few clicks of a mouse could shrink a waistline or erase someone from a photograph. Data manipulation today still relies on computers, but as the incident with the energy firm shows, the human voice — and, increasingly, video clips — are being used as a way to convince someone that what they’re hearing or seeing is real.
And while there might be an argument for using a deepfake for good, experts warn that without an understanding of them, a deepfake can wreak havoc on someone’s personal and professional life.
What is a deepfake?
A deepfake refers to a specific kind of synthetic media where a person in an image or video is swapped with another person's likeness.
The term “deepfake” was first coined in late 2017 by a Reddit user of the same name. This user created a space on the online news and aggregation site, where they shared pornographic videos that used open source face-swapping technology.
The term has since expanded to include “synthetic media applications” that existed before the Reddit page and new creations like StyleGAN — “realistic-looking still images of people that don’t exist,” said Henry Ajder, head of threat intelligence at deepfake detection company Deeptrace.
In more recent examples, deepfakes can be a voice that sounds like your boss on the other end of a phone line, Facebook’s Mark Zuckerberg in an edited video touting how great it is to have billions of people’s data, or Belgium’s prime minister linking the coronavirus pandemic to climate change during a manipulated recorded speech.
“The term understandably has a negative connotation, but there are a number of potentially beneficial use cases for businesses, specifically applications in marketing and advertising that are already being utilized by well-known brands," Ajder said.
That’s why a growing number of people in this space are instead using the term “artificial intelligence-generated synthetic media,” Ajder said. It’s broad enough to include the original definition of deepfake, but also specific enough to omit things like computer generated images from movies, or photoshopped images — both of which are technically examples of something that’s been modified.
How do you make a deepfake video?
To make a deepfake video, a creator swaps one person’s face and replaces it with another, using a facial recognition algorithm and a deep learning computer network called a variational auto-encoder [VAE], said Matt Groh, a research assistant with the Affective Computing Group at the MIT Media Lab.
VAEs are trained to encode images into low-dimensional representations and then decode those representations back into images.
For example, if you wanted to transform any video into a deepfake with Oscar-winning movie star Nicolas Cage, you’d need two auto-encoders — one trained on images of the actor’s face, and one trained on images of a wide diversity of faces.
The images of faces used for both training sets can be curated by applying a facial recognition algorithm to video frames to capture different poses and lighting conditions that naturally occur.
Once this training is done, you combine the encoder trained on the diverse faces with the decoder trained on Nicolas Cage’s faces, resulting in the actor’s face on someone else’s body.
Examples of deepfakes
Burgund and co-creator Francesca Panetta, XR creative director at the MIT Center for Advanced Virtuality, chose an arguably even more famous subject for their deepfake: former President Richard Nixon.
The pair built an art installation in 2019 that combined actual footage of Nixon’s resignation speech, and the text of an in-memoriam draft speech that had been written by Nixon speechwriter Bill Safire in case of a failed moon landing. The result is a deepfake video that, despite the creators’ attempts to be transparent about the fabrication, still tricked some viewers into thinking it was an unaired version of the speech.
“The purpose of the project is to try and find a kind of creative and evocative way to show what deepfakes look like,” Panetta said. “And to give people awareness about their existence and how realistic they can be.”
Panetta and Burgund worked with an actor, Lewis D. Wheeler, to read aloud a variety of Nixon speeches as well as the contingency speech, to get the right “presidential” tone and cadence. Then the recording was sent to Respeecher, which specializes in synthetic voices — in this case, turning the actor’s voice into Nixon’s.
Canny AI was the company that used artificial intelligence — specifically video dialogue replacement — to change the area around Nixon’s mouth, the movement of his head and face, and his hands, to match what was being said.
“It certainly is far from ‘press button: create deepfake,’” Burgund said. “That is not at all what it is. There are things that can get 80% there that are very, very easy [to do] but we wanted to go as far as possible with the current technology to make it as believable as possible.”
At Modulate, a Cambridge, Massachusetts-based company, engineers are creating “voice skins” for use in online games and social platforms. Modulate’s clients are companies in the online experience platform (online gaming) industry. It maintains a growing inventory of artificially-generated voices, which the gaming companies can purchase and then offer to their customers as part of the gaming experience.
“The idea is to give people the freedom to still sound authentically human, authentically emotive, still maintain all that control but effectively swap out their vocal chords so that the voice they're using is just an automatic match,” said Modulate CEO and co-founder Mike Pappas, SB ’14.
For example, if a player is attached to their character's in-game appearance — such as a grumbling dwarf or ethereal elf — they can choose a voice that allows them to sound like that character when they speak to other players.
In other cases, those who are subject to harassment, like women or kids, can use voice skins to make sure they're only sharing their gender or age if and when they're comfortable doing so.
In some cases, Pappas said, members of the transgender community will use voice skins that more accurately reflect their identity in an online forum.
How can you spot a deepfake?
While there isn’t a list of steps to take that will make someone completely immune to being fooled by a deepfake, there are some things to look for that can help in deciphering whether or not what you’re looking at is real.
Groh advised to pay attention to the:
- Face — Is someone blinking too much or too little? Do their eyebrows fit their face? Is someone’s hair in the wrong spot? Does their skin look airbrushed or, conversely, are there too many wrinkles?
- Audio — Does someone’s voice not match their appearance (ex. a heavyset man with a higher-pitched feminine voice).
- Lighting — What sort of reflection, if any, are a person’s glasses giving under a light? (Deepfakes often fail to fully represent the natural physics of lighting.)
The best way to inoculate people against deepfakes is exposure, Groh said. To support and study this idea, Groh and his colleagues created an online test as a resource for people to experience and learn from interacting with deepfakes.
But if you want to see a deepfake yourself, they’re not hard to find. In fact, Deeptrace’s Ajder explained, a lot of deepfake content is labeled as a deepfake, because creators are trying to show off their work.
Deeptrace was founded in late 2018 to provide capabilities (like software-as-a-service) for detecting deepfake images and videos. Deeptrace also monitors deepfake activity online and assists with taking down malicious deepfake videos targeting clients.
In 2019, the company published a report on the state of deepfakes. It found more than 14,000 deepfake videos online, a 100% increase over their 2018 count. The study found that 96% of deepfake videos are pornography, and nearly all of those involve women.
“This increase is supported by the growing commodification of tools and services that lower the barrier for non-experts to create deepfakes,” the report states. “Outside of politics, the weaponization of deepfakes and synthetic media is influencing the cybersecurity landscape, enhancing traditional cyber threats and enabling entirely new attack vectors.”
Who and what is at risk of a deepfake?
Watching viral videos of Texas Senator Ted Cruz with his face swapped for that of actor Paul Rudd, or actress Jennifer Lawrence answering questions at the Golden Globes — but with the face of actor Steve Buscemi — it might seem like politics and Hollywood should be the focus areas for combatting misleading videos, but as Deeptrace’s report showed, targets for manipulation are no longer limited to government leaders or famous actresses.
“It doesn’t have to be a politician to be a deepfake,” Panetta said in agreement. “It even might be your friend. It could be you that’s targeted. It doesn’t have to be someone who’s famous.”
For example, with scheduled, public quarterly earnings calls that are recorded, it could be possible to take a CFO’s voice recording and turn it into what sounds like an urgent directive to employees to share their bank information. Or imagine a similar recording but this time a CEO announces companywide layoffs; the market responds and stocks crash, all because of a deepfake.
“I'm not trying to sow paranoia here but we're trying to sort of be realistic about what could happen,” Burgund said. “No doubt there are people working on ways to figure out how to obfuscate in certain ways ... it's an arms race.”
Ajder said a big risk right now is defamation. Deepfake videos don’t even have to be that good, as long as the person is recognizable and the graphics are good enough for a viewer to identify the person and see they’re doing or saying something. That leaves an imprint, Ajder said, and can hurt someone’s reputation especially if their name and face is part of negative video or audio — real or a deepfake.
That’s another concern Ajder raised: plausible deniability. Deepfakes don’t just give someone the opportunity to disguise fake images or recordings as real, Ajder said, it also provides an opportunity for people to dismiss real events as fake.
What can a business leader do to protect their company and employees against deepfakes?
Deeptrace takes the approach championed by WITNESS Program Director Samuel Gregory: Don’t panic. Prepare.
“When it comes to securing business processes, you’ve got to identify the avenues where risks are most apparent,” Ajder said. “Maybe that is your telecom infrastructure in the company, maybe it’s the kind of video conferencing software you use.”
- Consider using semantic passwords for conversations, or a secret question you ask or answer at the start of a call.
- If you have a voice authentication service or biometric security features, ask those providers whether their tools are up to date.
- Educate your employees. Explain deepfake attacks might become more frequent and there is no magic formula for detecting them.
“Interrogate your security infrastructure,” Ajder said. “Understand where weak spots may be, prepare and see where technological solutions can fit into that infrastructure to secure at critical points.”
In Pappas’ mind, it’s everyone’s responsibility to protect against malicious deepfakes.
“The social answer is we all build an immune system,” he said. “We start asking ourselves questions: Who is the person presenting this image to me? Where did it come from? What is evident, what is actually authentic? Having that general demeanor of asking these questions certainly helps.”
The MIT Media Labs’ Groh said people can defend themselves against deepfakes using their own intuition and intellect.
“You have to be a little skeptical, you have to double-check and be thoughtful,” Groh said. “It’s actually kind of nice: It forces us to become more human, because the only way to counteract these kinds of things is to really embrace who we are as people.”
Ready to go deeper?
Test your deepfake-spotting skills.
Experiment with the MIT Media Lab’s artificial intelligence tool Deep Angel.
Watch: In Event of Moon Disaster.
Read: ‘The biggest threat of deepfakes isn’t the deepfakes themselves’ at MIT Technology Review.
Read: The State of deepfakes, a 2019 report from Deeptrace.
Read: Machine learning, explained