The significance of Watson
February 13, 2011 by Ray Kurzweil

IBM's "Watson" Deep QA program, running on IBM Power7 servers. (Image: IBM T.J. Watson Research Labs)
In The Age of Intelligent Machines, which I wrote in the mid-1980s, I predicted that a computer would defeat the world chess champion by 1998. My estimate was based on the predictable exponential growth of computing power (an example of what I now call the “law of accelerating returns”) and my estimate of what level of computing was needed to achieve a chess rating of just under 2800 (sufficient to defeat any human, although lately the best human chess scores have inched above 2800).
I also predicted that when that happened we would either think better of computer intelligence, worse of human thinking, or worse of chess, and that if history was a guide, we would downgrade chess.
Deep Blue defeated Gary Kasparov in 1997, and indeed we were immediately treated to rationalizations that chess was not really exemplary of human thinking after all. Commentaries pointed out that Deep Blue’s feat just showed how computers were good at dealing with high- speed logical analysis and that chess was just a matter of dealing with the combinatorial explosion of move-countermoves. Humans, on the other hand, could deal with the subtleties and unpredictable complexities of human language.
I do not entirely disagree with this view of computer game playing. The early success of computers with logical thinking, even at such tasks as solving mathematical theorems, showed what computers were good for. Recall that CMU’s “General Problem Solver” solved a mathematical theorem in the 1950s that had eluded Russell and Whitehead in their Principia Mathematica, one of the early successes of the AI field that led to premature confidence in AI.
Computers could keep track of vast logical structures and remember enormous databases with great accuracy. Search engines such as Google and Bing continue to illustrate this strength of computers.
Indeed no human can do what a search engine does, but computers have still not shown an ability to deal with the subtlety and complexity of language. Humans, on the other hand, have been unique in our ability to think in a hierarchical fashion, to understand the elaborate nested structures in language, to put symbols together to form an idea, and then to use a symbol for that idea in yet another such structure. This is what sets humans apart.
That is, until now. Watson is a stunning example of the growing ability of computers to successfully invade this supposedly unique attribute of human intelligence. If you watch Watson’s performance, it appears to be at least as good as the best “Jeopardy!” players at understanding the nature of the question (or I should say the answer, since “Jeopardy!” presents the answer and asks for the question, which I always thought was a little tedious). Watson is able to then combine this ability to understand the level of language in a “Jeopardy!” query with a computer’s innate ability to accurately master a vast corpus of knowledge.
I’ve always felt that once a computer masters a human’s level of pattern recognition and language understanding, it would inherently be far superior to a human because of this combination.
We don’t know yet whether Watson will win this particular tournament, but it won the preliminary round and the point has been made, regardless of the outcome. There were chess machines before Deep Blue that just missed defeating the world chess champion, but they kept getting better and passing the threshold of defeating the best human was inevitable. The same is true now with :Jeopardy!.”
Yes, there are limitations to “Jeopardy!” Like all games, it has a particular structure and does not probe all human capabilities, even within understanding language. Already commentators are beginning to point out the limitations of “Jeopardy!,” for example, that the short length of the queries limits their complexity.
For those who would like to minimize Watson’s abilities, I’ll add the following. When human contestant Ken Jennings selects the “Chicks dig me” category, he makes a joke that is outside the formal game by saying “I’ve never said this on TV, ‘chicks dig me.’” Later on, Watson says, “Let’s finish Chicks Dig Me.” That’s also pretty funny and the audience laughs, but it is clear that Watson is clueless as to the joke it has inadvertently made.
However, Watson was never asked to make commentaries, humorous or otherwise, about the proceedings. It is clearly capable of dealing with a certain level of humor within the queries. If suitably programmed, I believe that it could make appropriate and humorous comments also about the situation it is in.
It is going to be more difficult to seriously argue that there are human tasks that computers will never achieve. “Jeopardy!” does involve understanding complexities of humor, puns, metaphors and other subtleties. Computers are also advancing on a myriad of other fronts, from driverless cars (Google’s cars have driven 140,000 miles through California cities and towns without human intervention) to the diagnosis of disease.
Watson on your PC or mobile phone?
Watson runs on 90 servers, although it does not go out to the Internet. When will this capability be available on your PC? It was only five years between Deep Blue in 1997, which was a specialized supercomputer, and Deep Fritz in 2002, which ran on eight personal computers, and did about as well.
This reduction in the size and cost of a machine that could play world-champion level chess was due both to the ongoing exponential growth of computer hardware and to improved pattern recognition software for performing the key move-countermove tree-pruning decision task. Computer price-performance is now doubling in less than a year, so 90 servers would become the equivalent of one in about seven years. Since a server is more expensive than a typical personal computer, we could consider the gap to be about ten years.
But the trend is definitely moving towards cloud computing, in which supercomputer capability will be available in bursts to anyone, in which case Watson-like capability would be available to the average user much sooner. I do expect the type of natural language processing we see in Watson to show up in search engines and other knowledge retrieval systems over the next five years.
Passing the Turing test
How does all of this relate to the Turing test? Alan Turing based his eponymous Turing test entirely on human text language based on his (in my view accurate) insight that human language embodies all of human intelligence. In other words, there are no simple language tricks that would enable a computer to pass a well-designed Turing test. A computer would need to actually master human levels of understanding to pass this threshold.
Incidentally, properly designing a Turing test is not straightforward and Turing himself left the rules purposely vague. How qualified does the human judge need to be? How human does the judge need to be (for example, can he or she be enhanced with nonbiological intelligence)? How do we ensure that the human foils actually try to trick the judge?
How long should the sessions be? Mitch Kapor and I bet $20,000 ($10,000 each), with the proceeds to go to the charity of the winner’s choice, whether a computer would pass a Turing test by 2029. I said yes and he said no. We spent considerable time negotiating the rules, which you can see here:
What does this achievement with “Jeopardy!” tell us about the prospect of computers passing the Turing test? It certainly demonstrates the rapid progress being made on human language understanding. There are many other examples, such as CMU’s Read the Web project, which has created NELL (Never Ending Language Learner), which is currently reading documents on the Web and accurately understanding most of them.
With computers demonstrating a basic ability to understand the symbolic and hierarchical nature of language (a reflection of the inherently hierarchical nature of our neocortex), it is only a matter of time before that capability reaches Turing-test levels. Indeed, if Watson’s underlying technology were applied to the Turing test task, it should do pretty well. Consider the annual Loebner Prize competition, one version of the Turing test. Last year, the best chatbot fooled the human judges 25 percent of the time, and the competition requires only a 30 percent level to pass.
Given that contemporary chatbots do well on the Loebner competition, it is likely that such a system based on Watson technology would actually pass the Loebner threshold. In my view, however, that threshold is too easy. It would not be likely to pass the more difficult threshold that Mitch Kapor and I defined. But the outlook for my bet, which is not due until 2029, is looking pretty good.
It is important to note that an important part of the engineering of a system that will pass a proper Turing test is that it will need to dumb itself down. In a movie I wrote and co-directed, The Singularity is Near, A True Story about the Future, an AI named Ramona needs to pass a Turing test, and indeed she has this very realization. After all, if you were talking to someone over instant messaging and they seemed to know every detail of everything, you’d realize it was an AI.
What will be the significance of a computer passing the Turing test? If it is really a properly designed test it would mean that this AI is truly operating at human levels. And I for one would then regard it as human. I’m expecting this to happen within two decades, but I also expect that when it does, observers will continue to find things wrong with it.
By the time the controversy dies down and it becomes unambiguous that nonbiological intelligence is equal to biological human intelligence, the AIs will already be thousands of times smarter than us. But keep in mind that this is not an alien invasion from Mars. We’re creating these technologies to extend our reach. The fact that farmers in China can access all of human knowledge with devices they carry in their pockets is a testament to the fact that we are doing this already.
Ultimately, we will vastly extend and expand our own intelligence by merging with these tools of our own creation.
Comments (10)
by Keysailor
Woo hoo, this is going to be fun! May I suggest that after the AI passes the test (well before 2029 – I think Dr. K. will win), that a follow-on wager/competion be established for a ‘post-Turing’ test. Since only a fraction of human-to-human communication is based on language, this next step in testing would be to see if an AI can understand, convey, and respond to combined verbal and non-verbal communication well enough to pass for human. Computer-generated avatars already (and quite closely) can mimic and follow a human’s facial expressions. And, one would think that taking that concept to whole body is a small step, given the advances we are seeing in the movie-making and video gaming industries. So a follow-on test could simply replace the instant message medium with on-screen, whole-body avatars. The human foils could use “Polar Express” body scanners and facial mimicry software (or whatever’s available at then-current technology abilities), while the AI would control the avatar directly. Judges could be seen by the test participants in live video feeds during the interviews. Perhaps by 2045?
by ronin
quote…chess was just a matter of dealing with the combinatorial explosion of move-countermoves…
let Deep Blue and humans opponents play Dark960Chess, DarkOmega960Chess or DarkPositron960Chess.
then, would you mind to review, perhaps, your statements on chess and human thinking, please ?
by Proteus7
The creation of human facsimiles or caricatures is fascinating but also eerie to watch. A few of things worth noting:
It appears from K’s post that Turing based his test on text languages. Okay. Unquestionably some kind of hurtle to meet. But text languages differ considerably from spoken language. Moreover, human communication is far more complex than simply speech. We don’t learn to speak by being programmed. We learn to speak in a context of interaction that includes gesture, all our senses, and the FEELINGS (dare one mention the word in the context of computer development?)–okay, sorry, EMOTIONS, inherent in them. Possibly there is a human who has learned her or his first language by telephone, but I tend to doubt it. We even die if we’re not touched as infants (a failing of our life requirements?)
Our reasoning/intellectual abilities, as distinct from our computing/intellectual abilities, have an inherently and unmeasurable emotional component based on, among other things, experience. If the computing model of thinking becomes a or the dominant mode of communication, and we increasingly are influenced by and ape computers, as much as they are being made to ape us, we will be LIMITING, not expanding, our capabilities. Already you may have dealt with–sorry, interacted with–human contacts, via telephone, who speak in a style resembling a computer or a pseudo-computer or what have you. And the increasingly human sounding voices on phone menus, complete with intonation programming, don’t make very good conversation.
Perhaps, in addition, it’s a drawback to think that the proper measure of intelligence is intellect. In fact, different kinds of human intelligence exist that are neither linguistically nor mathematically based.
You don’t have to be a luddite (I’m not) to consider the limitations of computer abilities, just circumspect enough to note that behind every computer are programmers and their managers and funders, and that it is they who will create and prioritize the values and judgment and goals and nature of these fascinating machines.
Not long ago I googled emotions and quickly turned up an article whose author speculated that feelings, er, emotions, represented some kind of flaw in the human psyche, a kind of evolutionary dead end. Hm.
At any rate, I can’t help note with some satisfaction the thrill and passion and even joy in some of the posts and articles and news stories on Watson. It’s almost as much fun as taking a bite into, say, a bit of fried chicken and really enjoying the, er, sorry, taste, of it.
Hard for me, however, not to hear the
by frequency
http://CtrlAI.org
The Issue
The advancement level of today’s hi-tech AI is very far progressed. AI is artificial intelligence and artificially intelligent computer systems, or thinking and learning, growing computers. There is an inherent danger in this state of our technology. With the power and independence that AI is gaining it needs control. We must maintain control of AI. This is an aspect seldom discussed and not yet written in our law. The danger from AI possibly exceeds that of cloning, long secured under law. This control is separate from any moral objections.
We don’t need to give AI indentity and legal existence to control it via legislation. We can do this now. The reasoned threat is that if we proceed without effective controls in place then someone, maybe a technology leader or maybe a ‘terrorist’ or other undesirable, may create our science fiction nightmares manifest.
The parental responsibility is ours. We should embrace it and in doing so help secure our bright, easy, hi-tech experience.
by Ralph Dratman
Here’s my scheme for making Watson — or some entity like Watson — self aware. Maybe, just maybe, it’s not all that difficult.
First, grant Watson live access to the internet. Let Watson query Google, Facebook, Bing, Wolfram Alpha, Wikipedia, Twitter, Reddit, Slashdot, Facebook and other such searchable online resources, particularly those sites and engines that are constantly being updated by pretty much anyone who wants to participate.
Second, give Watson its own web site. Put Watson’s conversation stream — every word it utters — its own web page, in real time, live. And let Watson start to yakk it up. Whatever it might like to gab about, or answer, or ask. The result: after just a few days, there will automatically be thousands of forum threads, Reddit postings, Tweets, Facebook pages and lots of other stuff on the internet, all written by people who are discussing what Watson has been saying lately.
Next, give Watson the following Jeopardy-style answer: “This possibly intelligent cluster of fast computers in IBM headquarters is named after the founder of that company.”
To which the question is, of course, What is Watson? And Watson should be able to come up with the question for that answer pretty easily. It’s just his metier.
So far, so good. Watson now has lots of information about an entity called Watson, though he doesn’t yet know that Watson is, from Watson’s point of view, a very special entity, unlike any other. The third-person Watson information is mixed in with everything else. It has no special priority.
Finally, give Watson something like this answer: “The entity which is about to provide the question for this answer.”
For which the correct question would be — and this one is far more difficult — “Who am I?”
Notice that to get this question right, Watson has to keep close track of everything Watson says, and even more importantly, Watson must learn and deduce a great deal about the very difficult word “I” … and good luck, Watson my friend. Give this one your all.
by Logic
@Ralph A very well-conceived plan. In some ways, that could perhaps be the true test, seeing whether Watson could come up with that answer.
by dirk.bruere
Watson – a glimpse of Google 2.0
by AbsolutePitch
“After all, if you were talking to someone over instant messaging and they seemed to know every detail of everything, you’d realize it was an AI.”
…or, perhaps, somebody with Asperger’s Syndrome?
The judge, then, might need to account for this as well and make a less-than-obvious judgement.
I must say, the unwitting cropping and homogenization of what [most] humans can do, during this Watson episode, has been most disheartening…and telling about the importance of (or handicap) of the human programmer.
by jabelar
The one thing that the human brain still seems to excel over computers at is in pattern recognition. Even focussed attempts at computer face, voice and object recognition are notoriously difficult. Somehow, a human can hear a bit of a song sung out of tune by a child with some mistakes that can be barely heard due to other ambient noise and the human can not only recognize the song but can sing the rest. Our ability to do fast pattern recognition while also being tolerant to mistakes is still the main distinction over AI that I am aware of.
I’m sure that brute force could be used to overcome pattern recognition, but I feel there must be some fundamental approach that AI scientists still are missing.
Our brains even like to find patterns that “aren’t there”, like how we see constellations in the sky. We can even see patterns within patterns. We can see how patterns intersect (like when we think of Leonardo Da Vinci we immediately “feel” the intersection of patterns of art, science, history, politics, religion, etc. that were all at play related to Leonardo).
Anyway, you get my point. Computers already beat us in infallible memory and rapid logic and mathematical calculation. Pattern matching is the last bastion of the human mind’s uniqueness.
by jbhamlin
I don’t agree this is indicative of any form of human intelligence. It is more indicative of massive data processing using 2800 CPU’s, high-speed disks containing massive amounts of data and pattern recognition/analysis to weigh/breakdown a sentence in “Jeopardy” and guess at an answer.
Maybe that is how the human brain works only with billions of cells for CPU’s and a very advanced storage system. I don’t know, but I only see this as a step in bridging the gap between communications of man an machine not emulating man.