Google open-sources natural language understanding tools

May 11, 2016

Google has just released two powerful natural language understanding tools for free, open-source use by anyone. These tools allow machines to read and understand English text (such as text you type into a browser to do a Google search).

SyntaxNet is a “syntactic parser” — it allows machines to parse, or break down, sentences into their component parts of speech and identify the underlying meaning). And the Parsey McParseface program implements SyntaxNet in English (it learned from an annotated collection of old newswire stories called The Penn Treebank Project).

Here’s an example of how it parses and analyzes an English sentence :Using deep neural networks, SyntaxNet is implemented in Google’s TensorFlow (see Google open-sources its TensorFlow machine learning system).

So how well does it work?

On a standard benchmark consisting of randomly drawn English newswire sentences (“Penn Treebank”), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, Google says. ”Linguists trained for this task agree in 96–97% of the cases. This suggests that we are approaching human performance—but only on well-formed text.

“Because Parsey McParseface is the most accurate such model in the world, we hope that it will be useful to developers and researchers interested in automatic extraction of information, translation, and other core applications of NLU,” says Google.