Can an algorithm write a better news story than a human reporter?

April 8, 2013

Every 30 seconds or so, an algorithm developed by Narrative Science produces a computer-written news story, Wired reports.

The articles run on the websites of respected publishers like Forbes, as well as other Internet media powers (many of which are keeping their identities private).

Niche news services hire Narrative Science to write updates for their subscribers, be they sports fans, small-cap investors, or fast-food franchise owners.

And the articles don’t read like robots wrote them:

Friona fell 10-8 to Boys Ranch in five innings on Monday at Friona despite racking up seven hits and eight runs. Friona was led by a flawless day at the dish by Hunter Sundre, who went 2-2 against Boys Ranch pitching. Sundre singled in the third inning and tripled in the fourth inning … Friona piled up the steals, swiping eight bags in all …

Narrative Science’s algorithms built the article using pitch-by-pitch game data that parents entered into an iPhone app called GameChanger. Last year the software produced nearly 400,000 accounts of Little League games. This year that number is expected to top 1.5 million.

For Narrative Science’s CTO and cofounder, Kristian Hammond, these stories are only the first step toward what will eventually become a news universe dominated by computer-generated stories. How dominant? “More than 90 percent,” says Hammond.

This robonews tsunami, he insists, will not wash away the remaining human reporters who still collect paychecks. Whew! Was getting worried there. — Editor.

Instead the universe of newswriting will expand dramatically, as computers mine vast troves of data to produce ultracheap, totally readable accounts of events, trends, and developments that no journalist is currently covering.

That’s not to say that computer-generated stories will remain in the margins, limited to producing more and more Little League write-ups and formulaic earnings previews. Hammond was recently asked for his reaction to a prediction that a computer would win a Pulitzer Prize within 20 years. He disagreed. It would happen, he said, in five.

Narrative Science’s writing engine requires several steps. First, it must amass high-quality data. Then the algorithms must fit that data into some broader understanding of the subject matter. (For instance, they must know that the team with the highest number of “runs” is declared the winner of a baseball game.) So Narrative Science’s engineers program a set of rules that govern each subject, be it corporate earnings or a sporting event.

But how to turn that analysis into prose? The company has hired a team of “meta-writers,” trained journalists who have built a set of templates. Then comes the structure.

Once Narrative Science had mastered the art of telling sports and finance stories, the company realized that it could produce much more than journalism. Indeed, anyone who needed to translate and explain large sets of data could benefit from its services. Requests poured in from people who were buried in spreadsheets and charts.

It turned out that those people would pay to convert all that confusing information into a couple of readable paragraphs that hit the key points.

When the company was just getting started, meta-writers had to painstakingly educate the system every time it tackled a new subject. But before long they developed a platform that made it easier for the algorithm to learn about new domains.

Narrative Science’s main rival in automated story creation, a North Carolina company founded as Stat Sheet, has broadened its mission in similar fashion.

And the subject matter keeps getting more diverse. Narrative Science was hired by a fast-food company to write a monthly report for its franchise operators that analyzes sales figures, compares them to regional peers, and suggests particular menu items to push. What’s more, the low cost of transforming data into stories makes it practical to write even for an audience of one.

For now, though, journalism remains at the company’s core. And like any cub reporter, Narrative Science has dreams of glory — to identify and break big stories. To do that, it will have to invest in sophisticated machine-learning and data-mining technologies. It will also have to get deeper into the business of understanding natural language, which would allow it to access information and events that can’t be expressed in a spreadsheet.

Hammond believes that as Narrative Science grows, its stories will go higher up the journalism food chain — from commodity news to explanatory journalism and, ultimately, detailed long-form articles. Maybe at some point, humans and algorithms will collaborate, with each partner playing to its strength.