How predicting Shakespeare’s writing could improve our understanding of natural language

March 1, 2016

Google used the writings of 1,000 authors to train a deep neural network to predict writing patterns (credit: Martin Droeshout/Wikimedia Commons)

A Google natural language understanding research group led by Ray Kurzweil is building software systems that can understand natural language at a human level. The goal is to understand and interpret meanings of spoken or written language.

One key to achieving that understanding is establishing context, suggest researchers Chris Tar; Marc Pickett, PhD.; and Brian Strope, PhD., on the Google Research Blog.

For example, take the phrase, “Great, ice cream for dinner!”  If a six-year-old says it, it means something very different than if a parent says it, they point out.

That is, knowing the characteristics of the speaker (or writer) can narrow down the set of possible meanings of a phrase.

Similarly, the researchers suggested, a deep neural network (DNN) that takes into account the specific author’s style and “personality” should be able to predict (with higher accuracy than with a random guess) the next sentence an author would be likely to write in a book.

To test that idea, the researchers imported the text of 1,000 different authors from the Project Gutenberg website.

“The information our system derived is presumably representative of the author’s word choice, thinking, and style,” say the researchers. “We call these “Author vectors.’”

A section of a representation of “Author vectors” for some of the authors in the study. Note that contemporaries and influencers tend to be near each other (e.g., Marlowe and Shakespeare vs. Milton and Pope). It uses the t-SNE algorithm. (credit: Google/Christopher Olah)

Essentially, the system is saying, “I’ve been told that this is Shakespeare, who tends to write like this, so I’ll take that into account when weighing which sentence is more likely to follow.” In effect, one can chat with a statistical representation of text written by Shakespeare, the researchers note.

(Or in the future, suggest completions to the unfinished works of Philip K. Dick?)

The system could enrich Google products through personalization, the researchers suggest. “For example, it could help provide more personalized response options for the recently introduced Smart Reply feature in Inbox by Gmail” (a system that could automatically determine if an email was answerable with a short reply, and compose a few suitable responses that a user could edit or send with just a tap).