Jeopardy!, IBM, and Wolfram|Alpha
February 2, 2011 by Stephen Wolfram
About a month before Wolfram|Alpha launched, I was on the phone with a group from IBM, talking about our vision for computable knowledge in Wolfram|Alpha. A few weeks later, the group announced that they were going to use what they had done in natural language processing to try to make a system to compete on Jeopardy!
I thought it was a brilliant way to showcase their work — and IBM’s capabilities in general. And now, a year and a half later, IBM has built an impressive level of anticipation for their upcoming Jeopardy! television event. Whatever happens (and IBM’s system certainly should be able to win), one thing is clear: what IBM is doing will have an important effect in changing peoples’ expectations for how they might be able to interact with computers.
When Wolfram|Alpha was launched, people at first kept on referring to it as a “new search engine” — because basically keyword search was the only model they had for how they might find information on a large scale. But IBM’s project gives a terrific example of another model: question answering. And when people internalize this model, they’ll be coming a lot closer to realizing what’s possible with what we’re building in Wolfram|Alpha.
So what really is the relation between Wolfram|Alpha and the IBM Jeopardy! project?
IBM’s basic approach has a long history, with a lineage in the field of information retrieval that is in many ways shared with search engines. The essential idea is to start with textual documents, and then to build a system to statistically match questions that are asked to answers that are represented in the documents. (The first step is to search for textual matches to a question — using thesaurus-like and other linguistic transformations. The harder work is then to take the list of potential answers, use a diversity of different methods to score them, and finally combine these scores to choose a top answer.)
Early versions of this approach go back nearly 50 years, to the first phase of artificial intelligence research. And incremental progress has been made — notably as tracked for the past 20 years in the annual TREC (Text Retrieval Conference) question answering competition. IBM’s Jeopardy system is very much in this tradition — though with more sophisticated systems engineering, and with special features aimed at the particular (complex) task of competing on ”Jeopardy.”
Wolfram|Alpha is a completely different kind of thing — something much more radical, based on a quite different paradigm. The key point is that Wolfram|Alpha is not dealing with documents, or anything derived from them. Instead, it is dealing directly with raw, precise, computable knowledge. And what’s inside it is not statistical representations of text, but actual representations of knowledge.
The input to Wolfram|Alpha can be a question in natural language. But what Wolfram|Alpha does is to convert this natural language into a precise computable internal form. And then it takes this form, and uses its computable knowledge to compute an answer to the question.
There’s a lot of technology and new ideas that are required to make this work. And I must say that when I started out developing Wolfram|Alpha I wasn’t at all sure it was going to be possible. But after years of hard work —and some breakthroughs — I’m happy to say it’s turned out really well. And Wolfram|Alpha is now successfully answering millions of questions on the Web and elsewhere about a huge variety of different topics every day.
And in a sense Wolfram|Alpha fully understands every answer it gives. It’s not somehow serving up pieces of statistical matches to documents it was fed. It’s actually computing its answers, based on knowledge that it has. And most of the answers it computes are completely new: they’ve never been computed or written down before.
In IBM’s approach, the main part of the work goes into tuning the statistical matching procedures that are used — together in the case of “Jeopardy” with adding a collection of special rules to handle particular situations that come up.
In Wolfram|Alpha most of the work is just adding computable knowledge to the system. Curating data, hooking up real-time feeds, injecting domain-specific expertise, implementing computational algorithms — and building up our kind of generalized grammar that captures the natural language used for queries.
In developing Wolfram|Alpha, we’ve been steadily building out different areas of knowledge, concentrating first on ones that address fairly short questions that people ask, and that are important in practice. We’re almost exactly at the opposite end of things from what’s needed in ”Jeopardy!” — and from the direct path that IBM has taken to that goal. There’s no doubt that in time Wolfram|Alpha will be able to do things like the Jeopardy! task — though in an utterly different way from the IBM system — but that’s not what it’s built for today.
(It’s an interesting metric that Wolfram|Alpha currently knows about three quarters of the entities that arise in Jeopardy! questions — which I don’t consider too shabby, given that this is pretty far from anything we’ve actually set up Wolfram|Alpha to do.)
In the last couple of weeks, though, I’ve gotten curious about what’s actually involved in doing the Jeopardy task. Forget Wolfram|Alpha entirely for a moment. What’s the most obvious way to try doing Jeopardy?
What about just using a plain old search engine? And just feeding Jeopardy! clues into it, and seeing what documents get matched. Well, just for fun, we tried that. We sampled randomly from the 200,000 or so Jeopardy! clues that have been aired. Then we took each clue and fed it as input (without quotes) to a search engine.
Then we looked at the search engine result page, and (a) saw how frequently the correct ”Jeopardy!” answer appeared somewhere in the titles or text snippets on the page, and (b) saw how frequently it appeared in the top document returned by the search engine. (More details are given in this Mathematica notebook [download Player here]. Obviously we excluded sites that are specifically about ”Jeopardy!.”)
If nothing else, this gives us pretty interesting information about the modern search engine landscape. In particular, it shows us that the more mature search systems are getting to be remarkably similar in their raw performance — so that other aspects of user experience (like Wolfram|Alpha integration!) are likely to become progressively more important.
But in terms of “Jeopardy!,” what we see is that just using a plain old search engine gets surprisingly far. Of course, the approach here isn’t really solving the complete ”Jeopardy!” problem: it’s only giving pages on which the answer should appear, not giving specific actual answers. One can try various simple strategies for going further. Like getting the answer from the title of the first hit — which with the top search engines actually does succeed about 20% of the time.
But ultimately it’s clear that one’s going to have to do more work to actually compete on “Jeopardy” — which is what IBM has done.
So what’s the broader significance of the “Jeopardy!” project? It’s yet another example of how something that seems like artificial intelligence can be achieved with a system that’s in a sense “just doing computation” (and as such, it can be viewed as yet another piece of evidence for the general Principle of Computational Equivalence that’s emerged from my work in science).
But at a more practical level, it’s related to an activity that has been central to IBM’s business throughout its history: handling internal data of corporations and other organizations.
There are typically two general kinds of corporate data: structured (often numerical, and, in the future, increasingly acquired automatically) and unstructured (often textual or image-based). The IBM ”Jeopardy!” approach has to do with answering questions from unstructured textual data — with such potential applications as mining medical documents or patents, or doing e-discovery in litigation.
It’s only rather recently that even search engine methods have become widely used for these kinds of tasks — and with its “Jeopardy” project approach IBM joins a spectrum of companies trying to go further using natural-language-processing methods.
When it comes to structured corporate data, the ”Jeopardy!” project approach is not what’s relevant. And instead here there’s a large industry based on traditional business intelligence and data mining methods — that in effect allow one to investigate structured data in structured ways.
And it’s in this area that there’s a particularly obvious breakthrough made possible by the technology of Wolfram|Alpha: being able for the first time to automatically investigate structured data in completely free-form unstructured ways. One asks a question in natural language, and a custom version of Wolfram|Alpha built from particular corporate data can use its computational knowledge and algorithms to compute an answer based on the data — and in fact generate a whole report about the answer.
So what kind of synergy could there be between Wolfram|Alpha and IBM’s ”Jeopardy!” approach? It didn’t happen this time around, but if there’s a Watson 2.0, it should be set up to be able to call the Wolfram|Alpha API. IBM apparently already uses a certain amount of structured data and rules in, for example, scoring candidate answers.
But what we’ve found is that even just in natural language processing, there’s much more that can be done if one has access to deep broad computational knowledge at every stage. And when it comes to actually answering many kinds of questions, one needs the kind of ability that Wolfram|Alpha has to compute things.
On the other side, in terms of data in Wolfram|Alpha, we mostly concentrate on definitive structured sources. But sometimes there’s no choice but to try to extract structured data from unstructured textual sources. In our experience, this is always an unreliable process (achieving at most perhaps 80% correctness) — and so far we mostly use it only to “prime the pump” for later expert curation. But perhaps with something like IBM’s ”Jeopardy!” approach it’ll be possible to get a good supply of probabilistic candidate data answers — that can themselves be used as fodder for the whole Wolfram|Alpha computational knowledge engine system.
It’ll be interesting to see what the future holds for all of this. But for now, I shall simply look forward to IBM’s appearance on ”Jeopardy!.”
IBM has had a long and distinguished history of important R&D — something a disappointingly small number of companies can say today. I have had some good friends at IBM Research (sadly, not all still alive), and IBM as a company has much to be admired. It’s great to see IBM putting on such an impressive show, in an area that’s so close to my own longstanding interests.
Good luck on ”Jeopardy!.” I’ll be rooting for you, Watson.
© 2011 Stephen Wolfram, LLC. Reprinted with permission from Stephen Wolfram Blog, Jan 26, 2011.