An online encyclopedia that writes itself

Machine reading effort builds dossiers on people and organizations from translated news sources
June 26, 2012 | Source: Technology Review
auto-text-darpa

(Credit: DARPA)

They look a bit like communally written Wikipedia pages. But these articles — concise profiles of people and organizations, complete with lists of connected organizations, people, and events — were in fact written by computers, in a new bid by the Pentagon to build machines that can follow global news events and provide intelligence analysts with useful summaries in close to real time.

The prototype system is part of a nonpublic site built for intelligence agencies by Raytheon BBN in Cambridge, Massachusetts, and scheduled for delivery to the government later this year. It gathers information from 40 news websites written in English, Chinese, and Arabic, and eventually it will cover hundreds of news sites in all major languages. Ultimately the system will be linked with an existing TV broadcast monitoring network.

The system captures everything that appears on news sites and constantly and automatically adds information, says Sean Colbath, a senior scientist at BBN Technologies who helped develop the technology. ”

It starts by detecting an “entity” — a name or an organization, such as Boko Haram, accounting for a variety of spellings. Then it identifies other entities (events and people) that are connected to it, along with statements made by and about the subject. “It’s automatically extracting relationships between entities,” Colbath says. “Here the machine has learned, by being given examples, how to put these relationships together and fill in those slots for you.

The BBN project is the fruit of the Defense Advanced Research Projects Agency’s latest effort to build machines that read as humans do, a decades-old problem that has been the focus of increasing research in recent years. Under DARPA’s research program, prototypes have been built by SRI International and IBM as well as Raytheon BBN.