WHEN THINGS START TO THINK | Chapter 9: Bit Beliefs

May 15, 2003

Originally published by Henry Holt and Company 1999. Published on KurzweilAI.net May 15, 2003.

For as long as people have been making machines, they have been trying to make them intelligent. This generally unsuccessful effort has had more of an impact on our own ideas about intelligence and our place in the world than on the machines’ ability to reason. The few real successes have come about either by cheating, or by appearing to. In fact, the profound consequences of the most mundane approaches to making machines smart point to the most important lesson that we must learn for them to be able to learn: intelligence is connected with experience. We need all of our senses to make sense of the world, and so do computers.

A notorious precedent was set in Vienna in 1770, when Wolfgang van Kempelen constructed an automaton for the amusement of Empress Maria Theresa. This machine had a full-size, mustachioed, turbaned Turk seated before a chess board, and an ingenious assortment of gears and pulleys that enabled it to move the chess pieces. Incredibly, after van Kempelen opened the mechanism up for the inspection and admiration of his audiences and then wound it up, it could play a very strong game of chess. His marvel toured throughout Europe, amazing Benjamin Franklin and Napoleon.

After von Kempelen’s death the Turk was sold in 1805 to Johann Maelzel, court mechanic for the Habsburgs, supporter of Beethoven, and inventor of the metronome. Maelzel took the machine further afield, bringing it to the United States. It was there in 1836 that a budding newspaper reporter, Edgar Allan Poe, wrote an exposé duplicating an earlier analysis in London explaining how it worked. The base of the machine was larger than it appeared; there was room for a small (but very able) chess player to squeeze in and operate the machine.

While the Turk might have been a fake, the motivation behind it was genuine. A more credible attempt to build an intelligent machine was made by Charles Babbage, the Lucasian Professor of Mathematics at Cambridge from 1828 to 1839. This is the seat that was held by Sir Isaac Newton, and is now occupied by Stephen Hawking. Just in case there was any doubt about his credentials, his full title was “Charles Babbage, Esq., M.A., F.R.S., F.R.S.E., F.R.A.S., F. Stat. S., Hon. M.R.I.A., M.C.P.S., Commander of the Italian Order of St. Maurice and St. Lazarus, Inst. Imp. (Acad. Moral.) Paris Corr., Acad. Amer. Art. et Sc. Boston, Reg. Oecon. Boruss., Phys. Hist. Nat. Genev., Acad. Reg. Monac., Hafn., Massil., et Divion., Socius. Acad. Imp. et Reg. Petrop., Neap., Brux., Patav., Georg. Floren., Lyncei. Rom., Mut., Philomath. Paris, Soc. Corr., etc.” No hidden compartments for him.

Babbage set out to make the first digital computer, inspired by the Jacquard looms of his day. The patterns woven into fabrics by these giant machines were programmed by holes punched in cards that were fed to them. Babbage realized that the instructions could just as well represent the sequence of instructions needed to perform a mathematical calculation. His first machine was the Difference Engine, intended to evaluate quantities such as the trigonometric functions used by mariners to interpret their sextant readings. At the time these were laboriously calculated by hand and collected into error-prone tables. His machine mechanically implemented all of the operations that we take for granted in a computer today, reading in input instructions on punched cards, storing variables in the positions of wheels, performing logical operations with gears, and delivering the results on output dials and cards. Because the mechanism was based on discrete states, some errors in its operation could be tolerated and corrected. This is why we still use digital computers today.

Babbage oversaw the construction of a small version of the Difference Engine before the project collapsed due to management problems, lack of funding, and the difficulty of fabricating such complex mechanisms to the required tolerances. But these mundane details didn’t stop him from turning to an even more ambitious project, the Analytical Engine. This was to be a machine that could reason with abstract concepts and not just numbers. Babbage and his accomplice, Lady Ada Lovelace, realized that an engine could just as well manipulate the symbols of a mathematical formula. Its mechanism could embody the rules for, say, calculus and punch out the result of a derivation. As Lady Lovelace put it, “the Analytical Engine weaves algebraical patterns just as the Jacquard Loom weaves flowers and leaves.”

Although Babbage’s designs were correct, following them went well beyond the technological means of his day. But they had an enormous impact by demonstrating that a mechanical system could perform what appear to be intelligent operations. Darwin was most impressed by the complex behavior that Babbage’s engines could display, helping steer him to the recognition that biological organization might have a mechanistic explanation. In Babbage’s own memoirs, Passages from the Life of a Philosopher, he made the prescient observation that

It is impossible to construct machinery occupying unlimited space; but it is possible to construct finite machinery, and to use it through unlimited time. It is this substitution of the infinity of time for the infinity of space which I have made use of, to limit the size of the engine and yet to retain its unlimited power.

The programmability of his engines would permit them, and their later electronic brethren, to perform many different functions with the same fixed mechanism.

As the first computer designer it is fitting that he was also the first to underestimate the needs of the market, saying, “I propose in the Engine I am constructing to have places for only a thousand constants, because I think it will be more than sufficient.” Most every computer since has run into the limit that its users wanted to add more memory than its designers thought they would ever need. He was even the first programmer to complain about lack of standardization:

I am unwilling to terminate this chapter without reference to another difficulty now arising, which is calculated to impede the progress of Analytical Science. The extension of analysis is so rapid, its domain so unlimited, and so many inquirers are entering into its fields, that a variety of new symbols have been introduced, formed on no common principles. Many of these are merely new ways of expressing well-known functions. Unless some philosophical principles are generally admitted as the basis of all notation, there appears a great probability of introducing the confusion of Babel into the most accurate of all languages.

Babbage’s frustration was echoed by a major computer company years later in a project that set philosophers to work on coming up with a specification for the theory of knowledge representation, an ontological standard, to solve the problem once and for all. This effort was as unsuccessful, and interesting, as Babbage’s engines.

Babbage’s notion of one computer being able to compute anything was picked up by the British mathematician Alan Turing. He was working on the “Entscheidungsproblem,” one of a famous group of open mathematical questions posed by David Hilbert in 1900. This one, the tenth, asked whether a mathematical procedure could exist that could decide the validity of any other mathematical statement. Few questions have greater implications. If the answer is yes, then it could be possible to automate mathematics and have a machine prove everything that could ever be known. If not, then it would always be possible that still greater truths could lay undiscovered just beyond current knowledge.

In 1936 Turing proved the latter. To do this, he had to bring some kind of order to the notion of a smart machine. Since he couldn’t anticipate all the kinds of machines people might build, he had to find a general way to describe their capabilities. He did this by introducing the concept of a Universal Turing Machine. This was a simple machine that had a tape (possibly infinitely long), and a head that could move along it, reading and writing marks based on what was already on the tape. Turing showed that this machine could perform any computation that could be done by any other machine, by preparing it first with a program giving the rules for interpreting the instructions for the other machine. With this result he could then prove or disprove the Entscheidungsproblem for his one machine and have it apply to all of the rest. He did this by showing that it was impossible for a program to exist that could determine whether another program would eventually halt or keep running forever.

Although a Turing machine was a theoretical construction, in the period after World War II a number of laboratories turned to successfully making electronic computers to replace the human “computers” who followed written instructions to carry out calculations. These machines prompted Turing to pose a more elusive question: could a computer be intelligent? Just as he had to quantify the notion of a computer to answer Hilbert’s problem, he had to quantify the concept of intelligence to even clearly pose his own question. In 1950 he connected the seemingly disparate worlds of human intelligence and digital computers through what he called the Imitation Game, and what everyone else has come to call the Turing test. This presents a person with two computer terminals. One is connected to another person, and the other to a computer. By typing questions on both terminals, the challenge is to determine which is which. This is a quantitative test that can be run without having to answer deep questions about the meaning of intelligence.

Armed with a test for intelligence, Turing wondered how to go about developing a machine that might display it. In his elegant essay “Computing Machinery and Intelligence,” he offers a suggestion for where to start:

We may hope that machines will eventually compete with men in all purely intellectual fields. But which are the best ones to start with? Even this is a difficult decision. Many people think that a very abstract activity, like the playing of chess, would be best.

Turing thought so; in 1947 he was able to describe a chess-playing computer program. Since then computer chess has been studied by a who’s who of computing pioneers who took it to be a defining challenge for what came to be known as Artificial Intelligence. It was thought that if a machine could win at chess it would have to draw on fundamental insights into how humans think. Claude Shannon, the inventor of Information Theory, which provides the foundation for modern digital communications, designed a simple chess program in 1949 and was able to get it running to play endgames. The first program that could play a full game of chess was developed at IBM in 1957, and an MIT computer won the first tournament match against a human player in 1967. The first grandmaster lost a game to a computer in 1977.

A battle raged among computer chess developers between those who thought that it should be approached from the top down, studying how humans are able to reason so effectively with such slow processors (their brains), and those who thought that a bottom-up approach was preferable, simply throwing the fastest available hardware at the problem and checking as many moves as possible into the future. The latter approach was taken in 1985 by a group of graduate students at Carnegie Mellon who were playing hooky from their thesis research to construct a computer chess machine. They used a service just being made available through the Defense Advanced Research Projects Agency (DARPA) to let researchers design their own integrated circuits; DARPA would combine these into wafers that were fabricated by silicon foundries. The Carnegie Mellon machine was called “Deep Thought” after the not-quite-omniscient supercomputer in Douglas Adams’s Hitchhiker’s Guide to the Galaxy.

In 1988 the world chess champion Gary Kasparov said there was “no way” a grandmaster would be defeated by a computer in a tournament before 2000. Deep Thought did that just ten months later. IBM later hired the Carnegie Mellon team and put a blue suit on the machine, renaming it “Deep Blue.” With a little more understanding of chess and a lot faster processors, Deep Blue was able to evaluate 200 million positions per second, letting it look fifteen to thirty moves ahead. Once it could see that far ahead its play took on occasionally spookily human characteristics. Deep Blue beat Kasparov in 1997.

Among the very people you would expect to be most excited about this realization of Turing’s dream, the researchers in artificial intelligence, the reaction to the victory has been curiously muted. There’s a sense that Deep Blue is little better than van Kempelen’s Turk. Nothing was learned about human intelligence by putting a human inside a machine, and the argument holds that nothing has been learned by putting custom chips inside a machine. Deep Blue is seen as a kind of idiot savant, able to play a good game of chess without understanding why it does what it does.

This is a curious argument. It retroactively adds a clause to the Turing test, demanding that not only must a machine be able to match the performance of humans at quintessentially intelligent tasks such as chess or conversation, but the way that it does so must be deemed to be satisfactory. Implicit in this is a strong technological bias, favoring a theory of intelligence appropriate for a particular kind of machine. Although brains can do many things in parallel they do any one thing slowly; therefore human reasoning must use these parallel pathways to best advantage. Early computers were severely limited by speed and memory, and so useful algorithms had to be based on efficient insights into a problem. More recent computers relax these constraints so that a brute-force approach to a problem can become a viable solution. No one of these approaches is privileged—each can lead to a useful kind of intelligence in a way that is appropriate for the available means. There’s nothing fundamental about the constraints associated with any one physical mechanism for manipulating information.

The question of machine intelligence is sure to be so controversial because it is so closely linked with the central mystery of human experience, our consciousness. If a machine behaves intelligently, do we have to ascribe a kind of awareness to it? If we do, then the machine holds deep lessons about the essence of our own experience; if not, it challenges the defining characteristic of being human. Because our self-awareness is simultaneously so familiar and so elusive, most every mechanism that we know of gets pressed into service in search of an explanation. One vocal school holds that quantum mechanics is needed to explain consciousness. Quantum mechanics describes how tiny particles behave. It is a bizarre world, remote from our sensory experience, in which things can be in many places at the same time, and looking at something changes it. As best I’ve been able to reconstruct this argument, the reasoning is that (1) consciousness is mysterious, (2) quantum ­mechanics is mysterious, (3) nonquantum attempts to explain consciousness have failed, therefore (4) consciousness is quantum­mechanical. This is a beautiful belief that is not motivated by any experimental evidence, and does not directly lead to testable experimental predictions. Beliefs about our existence that are not falsifiable have a central place in human experience—they’re called religion.

I spent an intriguing and frustrating afternoon running over this argument with an eminent believer in quantum consciousness. He agreed that the hypothesis was not founded on either experimental evidence or testable predictions, and that beliefs that are matters of faith rather than observation are the domain of religion rather than science, but then insisted that his belief was a scientific one. This is the kind of preordained reasoning that drove Turing to develop his test in the first place; perhaps an addendum is needed after all to ask the machine how it feels about winning or losing the test.

The very power of the machines that we construct turns them into powerful metaphors for explaining the world. When computing was done by people rather than machines, the technology of reasoning was embodied in a pencil and a sheet of paper. Accordingly, the prevailing description of the world was matched to that representation. Newton and Leibniz’s theory of calculus, developed around 1670, provided a notation for manipulating symbols representing the value and changes of continuous quantities such as the orbits of the planets. Later physical theories, like quantum mechanics, are based on this notation.

At the same time Leibniz also designed a machine for multiplying and dividing numbers, extending the capabilities of Pascal’s 1645 adding machine. These machines used gears to represent numbers as discrete rather than continuous quantities because otherwise errors would inevitably creep into their calculations from mechanical imperfections. While a roller could slip a small amount, a gear slipping is a much more unlikely event. When Babbage started building machines to evaluate not just arithmetic but more complex functions he likewise used discrete values. This required approximating the continuous changes by small differences, hence the name of the Difference Engine. These approximations have been used ever since in electronic digital computers to allow them to manipulate models of the continuous world.

Starting in the 1940s with John von Neumann, people realized that this practice was needlessly circular. Most physical phenomena start out discrete at some level. A fluid is not actually continuous; it is just made up of so many molecules that it appears to be continuous. The equations of calculus for a fluid are themselves an approximation of the rules for how the molecules behave. Instead of approximating discrete molecules with continuous equations that then get approximated with discrete variables on a computer, it’s possible go directly to a computer model that uses discrete values for time and space. Like the checkers on a checkerboard, tokens that each represent a collection of molecules get moved among sites based on how the neighboring sites are occupied.

This idea has come to be known as Cellular Automata (CAs). From the 1970s onward, the group of Ed Fredkin, Tomaso Toffoli, and Norm Margolus at MIT started to make special-purpose computers designed for CAs. Because these machines entirely dispense with approximations of continuous functions, they can be much simpler and faster. And because a Turing machine can be described this way, a CA can do anything that can be done with a conventional computer.

A cellular automata model of the universe is no less fundamental than one based on calculus. It’s a much more natural description if a computer instead of a pencil is used to work with the model. And the discretization solves another problem: a continuous quantity can represent an infinite amount of information. All of human knowledge could be stored in the position of a single dot on a piece of paper, where the exact location of the dot is specified to trillions of digits and the data is stored in the values of those digits. Of course a practical dot cannot be specified that precisely, and in fact we believe that the amount of information in the universe is finite so that there must be a limit. This is built into a CA from the outset.

Because of these desirable features, CAs have grown from a computational convenience into a way of life. For the true believers they provide an answer to the organization of the universe. We’ve found that planets are made of rocks, that rocks are made up of atoms, the atoms are made up of electrons and the nucleus, the nucleus is made up of protons and neutrons, which in turn are made up of quarks. I spent another puzzling afternoon with a CA guru, who was explaining to me that one needn’t worry about this subdivision continuing indefinitely because it must ultimately end with a CA. He agreed that experiments might show behavior that appears continuous, but that just means that they haven’t looked finely enough. In other words, his belief was not testable. In other words, it was a matter of personal faith. The architecture of a computer becomes a kind of digital deity that brings order to the rest of the world for him.

Like any religion, these kinds of beliefs are enormously important in guiding behavior, and like any religion, dogmatic adherence to them can obscure alternatives. I spent one more happily exasperating afternoon debating with a great cognitive scientist how we will recognize when Turing’s test has been passed. Echoing Kasparov’s “no way” statement, he argued that it would be a clear epochal event, and certainly is a long way off. He was annoyed at my suggestion that the true sign of success would be that we cease to find the test interesting, and that this is already happening. There’s a practical sense in which a modern version of the Turing test is being passed on a daily basis, as a matter of some economic consequence.

A cyber guru once explained to me that the World Wide Web had no future because it was too hard to figure out what was out there. The solution to his problem has been to fight proliferation with processing. People quickly realized that machines instead of people could be programmed to browse the Web, collecting indices of everything they find to automatically construct searchable guides. These search engines multiplied because they were useful and lucrative. Their success meant that Web sites had to devote more and more time to answering the rapid-fire requests coming to them from machines instead of the target human audience. Some Web sites started adding filters to recognize the access patterns of search engines and then deny service to that address. This started an arms race. The search engines responded by programming in behavior patterns that were more like humans. To catch these, the Web sites needed to refine their tests for distinguishing between a human and machine. A small industry is springing up to emulate, and detect, human behavior.

Some sites took the opposite tack and tried to invite the search engines in to increase their visibility. The simplest way to do this was to put every conceivable search phrase on a Web page so that any query would hit that page. This led to perfectly innocent searches finding shamelessly pornographic sites that just happen to mention airline schedules or sports scores. Now it was the search engine’s turn to test for human behavior, adding routines to test to see if the word use on a page reflects language or just a list. Splitting hairs, they had to further decide if the lists of words they did find reflected reasonable uses such as dictionaries, or just cynical Web bait.

Much as Gary Kasparov might feel that humans can still beat computers at chess in a more fair tournament, or my colleague thinks that the Turing test is a matter for the distant future, these kinds of reasoning tasks are entering into an endgame. Most computers can now beat most people at chess; programming human behavior has now become a job description. The original goals set for making intelligent machines have been accomplished.

Still, smart as computers may have become, they’re not yet wise. As Marvin Minsky points out, they lack the common sense of a six-year-old. That’s not too surprising, since they also lack the life experience of a six-year-old. Although Marvin has been called the father of artificial intelligence, he feels that that pursuit has gotten stuck. It’s not because the techniques for reasoning are inadequate; they’re fine. The problem is that computers have access to too little information to guide their reasoning. A blind, deaf, and dumb computer, immobilized on a desktop, following rote instructions, has no chance of understanding its world.

The importance of perception to cognition can be seen in the wiring of our brains. Our senses are connected by two-way channels: information goes in both directions, letting the brain fine-tune how we see and hear and touch in order to learn the most about our environment.

This insight takes us back to Turing. He concludes “Computing Machinery and Intelligence” with an alternative suggestion for how to develop intelligent machines:

It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. This process could follow the normal teaching of a child. Things would be pointed out and named, etc.

The money has been spent on the computer’s mind instead. Perhaps it’s now time to remember that they have bodies, too.

WHEN THINGS START TO THINK by Neil Gershenfeld. ©1998 by Neil A. Gershenfeld. Reprinted by arrangement with Henry Holt and Company, LLC.