A new method for rooting out social media bots

Borrowing a model from statistical physics, an MIT professor finds a better way to detect online bots.

By Dylan Walsh  |  July 10, 2018


Illustration: Rob Dobi

Why It Matters

A new algorithm distinguishes between online bots and people better and with less information than before. It’s a step toward countering the influence of bots in U.S. elections and politics.

First performed in 1921, the play Rossum’s Universal Robots both coined the term “robot” and solidified the Promethean plotline now standard in robot movies: Man invents robot, robot destroys man. But this formulation might be anachronistic. What if the robots don’t walk among us, but exist online? And what if they don’t enact physical violence, but psychological deception? 

A number of recent reports describe how foreign actors designed social media bots to manipulate U.S. elections. On both Twitter and Facebook, these bots shared and amplified politically polarizing content in an attempt to sow discord and promote certain agendas. In short, an invisible network of bots worked to undermine a democratic election.

As this deployment of social media bots becomes commonplace, efforts to detect and shut down their accounts strive to keep pace. “This is the newest arms race,” said Tauhid Zaman, associate professor of operations management at MIT Sloan. In a new working paper coauthored with MIT Operations Research Center graduate student Nicolas Guenon des Mesnards, Zaman proposes a new method that he believes is well suited for modern-day bot detection. 

Algorithms have traditionally been trained to find bots by screening the details of individual accounts: username, number and timing of tweets or posts, content, and so on. “You collect all sorts of information about the account and this tells you if it’s a bot,” Zaman said. He and Mesnards wondered instead whether patterns across a network might be used to signal a coordinated group of bots at once. 

They started by studying the behavior of bots and found a property that sets them apart: Nobody engages with them, not even other bots. “Humans talk to humans, bots talk to humans, and nobody talks to bots,” Zaman said. “So if you have a lot of active people who aren’t talking to each other, and other people aren’t talking to them, then you’re probably looking at bots.” 

With this insight, and by co-opting the Ising model used to the study magnets in statistical physics, Zaman and Mesnards created an algorithm that distinguishes between bots and people by looking at their interaction network (retweets and mentions). They then tested this algorithm on six historical events during which it’s well known that social media bots injected themselves into the online conversation: Black Lives Matter demonstrations of 2015 and 2016, the Pizzagate conspiracy, the first 2016 U.S. presidential debate, a scandal in the recent French elections, and elections in Hungary.

When they compared their results to BotOrNot, a state-of-the-art algorithm for detecting individual bots, Zaman and Mesnards found that they not only could distinguish bots from humans more reliably, but they were able to do so using much less information. Whereas BotOrNot would need detailed account information to separate the bots from the humans who discussed Pizzagate on Twitter, the Ising model did the same work more accurately using only the structure of the interaction network.

One further advantage of the model is that it is “language agnostic.” Many algorithms that search for bots one at a time require specific linguistic and cultural knowledge of the accounts under question. Because the Ising model only incorporates network structure, it can be used regardless of language; it can also be used regardless of platform — it should work as well on Facebook or Reddit as it does on Twitter, Zaman said.

The one case where the Ising model algorithm was not able to outperform BotOrNot was in the Black Lives Matter demonstrations of 2015. Zaman believes this is because coordinated bots weren’t heavily involved in the movement that year, and so the network as a whole couldn’t provide enough useful information. “The pattern didn’t exist much in 2015,” he said. But then “something changed in the Twitterverse” and bots emerged as a strong presence in the Black Lives Matter demonstrations of 2016.

In a final inquiry to understand not just how to detect bots, but what they might be up to, Zaman created a word cloud of hashtags from two of the six events he and Mesnards investigated. What he found, according to the paper, is that “bots intend to internationalize an otherwise local controversy to reach a broader audience.” More generally, and more bluntly, “they have an agenda,” Zaman said. 

He recognizes that the value of this new approach has a shelf life. Soon enough, the people who build bots will devise a way to fool the algorithm. “All you need is for the bots to get some retweets,” he said. His algorithm will be more likely to assume bots are human if other people (or bots) engage with them. “But this is the challenge: we’re always going back and forth, back and forth. Right now, anyway, this is our newest weapon.”

This is the second in a three-part series examining new work about Twitter, influence, and bots by MIT Sloan associate professor Tauhid Zaman. Read part one, “Solving Twitter’s follow-back problem.”