Identifying the brains behind a terrorist attack or an infectious-disease primary source

August 11, 2012
911_network

The network of hijackers involved in the September 11, 2011 attack and their associates. Shortly after the incident, Mohamed Atta (in red) was identified by the authorities as the ringleader of the attack. Here, we use two observers (A. Ain and A. Zub, in green), whose communications are wiretapped. Based only on the timing and direction of arrival of the messages received by the two observers, the EPFL algorithm identified three possible sources (in red): M. Atta, A. Ani and A. Mzo. (Credit:EPFL)

Could a computer algorithm identify the source of a terrorism attack, like the Gauss virus, or an epidemic?

Pedro Pinto, a researcher at the Swiss Federal Institute of Technology in Lausanne (EPFL), says his team has developed such an algorithm.

“Using our method, we can find the source of all kinds of things circulating in a network just by ‘listening’ to a limited number of members of that network,” he says.

For example, suppose you come across a rumor about yourself that has spread on Facebook and been sent to 500 people — your friends, or even friends of your friends. How do you find the person who started the rumor?

“By looking at the messages received by just 15–20 of your friends, and taking into account the time factor, our algorithm can trace the path of that information back and find the source,” Pinto says. This method can also be used to identify the origin of a spam message or a computer virus using only a limited number of sensors within the network.

The method would be useful in responding to terrorist attacks, such as the 1995 sarin gas attack in the Tokyo subway, in which poisonous gas released in the city’s subterranean tunnels killed 13 people and injured nearly 1,000 more. “Using this algorithm, it wouldn’t be necessary to equip every station with detectors. A sample would be sufficient to rapidly identify the origin of the attack, and action could be taken before it spreads too far,” says Pinto.

Computer simulations of the telephone conversations that could have occurred during the terrorist attacks on September 11, 2001, were used to test Pinto’s system. “By reconstructing the message exchange inside the 9/11 terrorist network extracted from publicly released news, our system spit out the names of three potential suspects — one of whom was found to be the mastermind of the attacks, according to the official inquiry.”

Model for assigning likelihood to a source. Left: locating a cellphone on a wireless network, based on signal travel time. Right: Two observers measure the arrival time of information; measurements are combined to generate a likelihood for each candidate source. (Credit: EPFL)

In the team’s model, individuals (or other entities) are imagined as points, or “nodes,” in a plane, connected by a network of lines, as APS Physics explains. Each node has several lines connecting it to other nodes, and each node can be either infected by a computer virus (or other condition), or uninfected.

To trace back to the source using data from a fraction of the nodes, Pinto and his colleagues adapted methods used in wireless communications networks. When three or more base stations receive a signal from one cell phone, the system can measure the difference in the signal’s arrival time at each base station to triangulate a user’s position. Similarly, the arrival times of an infection at a subset of “observer” nodes can be used to find the source.

Detecting the primary source of an epidemic, or the best viral blog

Locating the source of a cholera outbreak in the KwaZulu-Natal province in South Africa: a graphical model of the Thukela river basin. Nodes represent small communities and associated water reservoirs, in which the disease can be diffused and grow. (Credit: Pedro C. Pinto, Patrick Thiran, Martin Vetterli/PRL)

The algorithm can also be employed to find the primary source of an infectious disease, such as cholera. “We tested our method with data on an epidemic in South Africa provided by EPFL professor Andrea Rinaldo’s Ecohydrology Laboratory,” says Pinto. “By modeling water networks, river networks, and human transport networks, we were able to find the spot where the first cases of infection appeared by monitoring only a small fraction of the villages.”

According to Pinto, the algorithm could also be used preventatively — for example, to understand an outbreak before it gets out of control. “By carefully selecting points in the network to test, we could more rapidly detect the spread of an epidemic,” he points out.

It could also be a valuable tool for advertisers who use viral marketing strategies by leveraging the Internet and social networks to reach customers. For example, this algorithm would allow them to identify the specific Internet blogs that are the most influential for their target audience and to understand how in these articles spread throughout the online community.