Ideas Made to Matter
Fighting online extremists with machine learning
After roughly 15 years in the Army, Christopher Marks enrolled as a PhD student at MIT.
“He was interested in questions of national security,” said his adviser, Tauhid Zaman, the KDD Career Development Professor in Communications and Technology at MIT Sloan.
The two of them began discussing lines of inquiry that best integrated Marks’ research ideas with Zaman’s expertise in social networks.
“And then ISIS popped up.”
The Islamic state group’s dominance of national news, along with its heavy presence on Twitter, made it a natural subject of investigation. With input from Jytte Klausen, a specialist of western jihad at Brandeis University, Zaman and Marks started building a model to predict which users of Twitter likely belonged to the Islamic state group, also known as ISIS.
“We wanted to know if you can look at an account and predict it is ISIS before they say anything,” Zaman said. “It was important to get to them before they tweeted anything, because, once they do, it’s usually something bad, like the address of a soldier.”
The researchers collected Twitter data on approximately 5,000 “seed users” who were either known Islamic state group members or who were connected to known Islamic state group members. (They obtained the names of these users through news stories, blogs, and reports released by law enforcement agencies and think tanks.)
The information they gathered included the basics of an account, like location, screen name, and profile picture, along with the account’s unique ID number; Zaman and Marks did the same for the friends and followers of each seed user, eventually creating a dataset of more than 1.3 million users.
Alongside this data collection, they continuously monitored Twitter over a few months to find which accounts were suspended or shut down — about 60,000 in total. (Twitter takes an active approach to silencing ISIS propaganda.) Putting these two pieces together, Zaman and Marks were able to train a machine-learning model that matched suspended accounts with the specifics of the profile, creating a system for identifying likely members of ISIS.
“By doing that we got pretty good at predictive accuracy,” Zaman said.
They reached a point where the model could detect more than half of known ISIS accounts simply by studying characteristics of the profile and patterns in its network.
As a follow-up, they tackled the problem of users who create new accounts when their old one has been suspended. Is there a good way, they wondered, to discover those doppelgängers?
“When someone comes back he might change his name and his picture, but he’ll generally hang around the same neighborhood,” Zaman said; that is, when someone who has had his account shut down returns to Twitter, he’ll likely reconnect with the same network. “So we built another machine learning model using all the features of an account to predict when a user might return, whom he’ll reconnect with, and the probability of that reconnection.”
When somebody returned and created a recognizable network, Zaman and Marks’ algorithm then tested the similarity of the account name and profile picture against cancelled accounts to gauge the likelihood that the person under scrutiny had previously operated a suspended account.
In the end, the project provided “a coherent system to police these online ecosystems,” Zaman said. He noted that this isn’t a necessity in the case of Twitter: as a publicly held company, Twitter remains internally vigilant about curbing abuses of its platform.
“Twitter naturally doesn’t want to be known as the social network for terrorism and ISIS,” he said.
But if Islamic state group propagandists jump to other social networking sites, there is no guarantee that these companies will take it upon themselves to monitor their users; nor will companies necessarily share their data with external partners. These concerns highlight the value of the work: the model is agnostic to the kind of extremism under consideration — it needn’t be ISIS — as well as the social network being used.
“It could be anti-Semitic propaganda or online bullying,” Zaman said, and it could take place on any social network since the basic currency of these websites is followers.
“You also don’t need the cooperation of the network to make this happen,” he said.
One concern raised by Zaman is that the technology itself is also agnostic.
“In the ideal case, it’s used by benevolent governments to protect people from these kinds of violent groups,” he said.
But it is easy to imagine an authoritarian government using the work to suppress dissent.
“It’s a tool that we developed as scientists, but in the end it’s up to the people in positions of power to use it responsibly,” he said.