THE AGE OF INTELLIGENT MACHINES | Chapter 4: The Formula for Intelligence

September 24, 2001
Author:
Ray Kurzweil

Everything should be made as simple as possible, but no simpler.

Albert Einstein

Politics are for the moment. An equation is for eternity.

Albert Einstein

I cannot believe . . . that God plays dice with the world.

Albert Einstein

Unifying Formulas: The Goal of Science

There is a profound satisfaction in simple explanations that can truly account for complicated phenomena. The search for unifying formulas (sometimes called “applied mathematics”) has been a goal of science since its inception with the Ionian Greeks twenty-five centuries ago.1

General relativity

Perhaps the most famous unifying formula is E = mc2 (energy equals mass times the speed of light squared). The formula, part of the general theory of relativity put forth by Albert Einstein in 1905, is simple enough: only five symbols, including the equal sign.2 Its power is manifest both in the range of phenomena it explains and in the nuclear age it spawned. The equation predicts and explains the power of a nuclear explosion. Mass when converted to energy is multiplied by an enormously large number. The result is a dramatic example of the power of an idea.

The laws of thermodynamics

Another famous example of unifying principles is the laws of thermodynamics. Developed in the mid nineteenth century and concerned with the conditions that all physical systems must obey, this was the first major refinement to the laws of classical mechanics developed a century earlier.3

The first law is,

For any process involving no effects external to the system except displacement of a mass between specified levels in a gravity field, the magnitude of that mass is fixed by the end states of the system and is independent of the details of the process.

The second law is,

Among all the allowed states of a system with given values of energy, numbers of particles, and constraints, one and only one is a stable equilibrium state. Such a state can be reached from any other allowed state of the same energy, numbers of particles, and constraints and leave no effects on the state of the environment.

There are several important implications of the laws of thermodynamics, one of which is the impossibility of perpetual-motion machines. Another implication, relevant to the emergence of intelligence in the natural world, is a corollary of the two laws called the principle of increasing entropy. “Entropy” is a measure of randomness or lack of order in a system of many components, such as molecules. Though named the law of increasing entropy, the law actually states that entropy in a system can stay the same, but only under special conditions. Thus, entropy generally increases. Another way to state the same law is, Left to their own devices, systems will become increasingly disordered, a strangely poetic principle.4

The law of entropy seems to imply that the natural emergence of intelligence is impossible. Intelligent behavior is very far from random, and any system capable of intelligent behavior would have to be highly ordered. The chemistry of life in general, and intelligent life in particular, is comprised of exceptionally elaborate designs.5 Out of the randomly swirling mass of particles in the universe, order has somehow managed to emerge.

A possible explanation is that thermodynamics is not applicable to the immensely long time frames that evolution operates in. Evolution has created systems of enormous order but has taken an enormously long period of time to do so. The order of life takes place amid great chaos and does not appreciably impact the measure of entropy in the physically large system in which life has evolved. No organism is a closed system; it is part of a larger system we call the environment, which is high in entropy. Nevertheless, from the viewpoint of the laws of thermodynamics, the emergence of intelligence must be considered a surprise.6

The theory of everything

Physicists have long had a talent for choosing entertaining anthropomorphic names for otherwise abstruse concepts with such words as “truth,” “strangeness,” and “charm” for complex mathematical properties of subatomic particles and “up,” “down,” “charm,” “strange,” “top,” and “bottom” for the names of quarks, the elementary building blocks of matter.7 The same comment might be applied to terminology in the AI field, although one could argue that AI terms such as “experts,” “expert managers,” “demons,” “knowledge sources,” and “logical inference processes” do have somewhat greater relevance to the phenomena being described than strangeness and charm do to the phenomena of particle physics.

Most recently physicists have been making progress on the ambitious quest to discover a unified set of equations to link the four basic forces of nature: gravity, electromagnetism, the strong force, and the weak force. The same quest, for what was then called the unified field theory, occupied Einstein’s last two decades.8

Recently two of the forces, the electromagnetic and weak forces, have been clearly linked as two manifestations of an electroweak force. Most recently new theoretical developments involving a concept of supersymmetry may be able to link the remaining three forces into a unified structure referred to as the theory of everything.9 In this theory, ultimate reality is composed of vibrating strings. All the phenomena we are familiar with, from subatomic particles to life forms, are resonances caused by the interactions of these vibrations. This theory gives new meaning to the saying “All the world is a song.”

The formula for intelligence

As these examples illustrate, the search for unifying formulas and principles is a primary drive that fuels all of the sciences. The same sort of question has long fascinated researchers in the field of intelligence, both natural and artificial: Is there a formula that describes, explains or underlies intelligence? At first, the answer might appear to be an obvious no. We have not been entirely successful in even defining intelligence, much less in expressing it in a formula, a set of laws, or a set of models. Intelligence would seem to be too complex for such reduction.10 On the other hand, we should not quickly dismiss the notion that a lot of what we know about intelligence can be expressed in such a manner. Some might argue that any description of intelligence in a formula just reflects how little we know about it. I would argue that the possibility of describing intelligent processes by simple equations or at least simple paradigms is a reflection of its elegance. If there is any coherency to intelligence as a unified phenomenon, we should be able to say something about its structure. I will describe several approaches to a description of intelligence and its supporting mechanisms through unified rules and formulas in the hope that they reveal insights into its nature.

The Sea of Logic and the Turing Machine

Any computer that we can describe can in theory be constructed from connecting a suitable number (generally a very large number) of a very simple device, the NOR logic gate. This device transforms truth values. The device takes two inputs, each of which can be either true or false at any point in time. The NOR gate has one output, which is true if neither input 1 nor input 2 is true. From this simple transformation, we can build all other logic functions and even memory and thus provide all of the information-processing capabilities required for computation. A more detailed derivation is provided in the article accompanying this chapter.

The Church-Turing thesis (discussed in chapter 3) postulates an essential equivalence between machine and human intelligence, not necessarily between current levels of capability and complexity, but in their underlying methods.11 Since we can construct any information-processing machine from NOR it can be considered the basis of all machine intelligence. If one accepts the Church-Turing thesis, NOR can be considered the basis of human intelligence as well.

While it is true that any algorithm, regardless of its complexity, can in theory be implemented with NOR gates, a collection of such devices will not perform any useful function unless they are connected together in an appropriate way. Part of

The sea-of-logic machine, a formula for intelligence? By connecting together in just the right way a large number of NOR gates (very simple devices), we can perform any intelligent function.

most useful information-processing methods also require memory (which can also be built from NOR gates), and we need each memory cell initialized to the right value. The connection of the NOR gates and the initial contents of the memory cells can both be described by lists of numbers, or the symbolic equivalent of numbers. They can, therefore, be considered as forms of software. Seen in this light, the NOR gate provides us with a unifying formula for the hardware of intelligence, but not for the software.

One might argue that the connection of the NOR gates should be considered hardware, and the initial contents of the memory cells should be considered software. I regard them both as software, because they are both equivalent to lists of numbers, and thus both can be represented using softwarelike languages. The evolution of electronic design is indeed proving the essential equivalence between so-called hardware design and software development. Increasingly, hardware designers are working at computer terminals, storing their work products on floppy disks (just like software engineers), and designing their systems using formula-based languages very similar to software languages. Take, for example, the emerging technology of silicon compilers. These systems allow a hardware engineer to describe an actual chip in terms of the logical and numeric transformations it is designed to perform.12 These transformations are expressed in a language very similar to a high-level software programming language. The designer interacts with a simulator that allows the user to test out the “program” before the actual chip is constructed. Once this process is complete, the silicon-compiler software converts this program into a series of instructions for a silicon fabrication machine to create an actual VLSI (very large scale integrated) circuit chip. The program created by the chip designer is controlling the connection of the logic gates on the chip. We might regard the design of a chip to be the ultimate in hardware design, yet the development process, the languages used, and the work product are very similar to those for software.13

Comparison of the Turing machine and the sea of logic

Another simple yet powerful model that can potentially be used to describe any form of computation is the Turing machine. As described in chapter 3, the Turing machine is a theoretical model of an information-processing machine. The Turing machine, which like the NOR gate is relatively simple, continues to be used by mathematicians as a satisfactory model of what computers can and cannot do. The Turing machine has proven to be a more powerful mathematical model of computation than NOR-based logic because of the body of Turing-machine theory that has been developed.

Models of the Turing machine have been built, but they can never be true Turing machines, because it is not possible to build an infinitely long tape. On its face, it would appear that the “sea of logic” machine and the Turing machine are not equivalent, again because of the infinitely long tape of the Turing machine. It can be shown, however, that a Turing machine can simulate any sea-of-logic machine. Furthermore, it can be shown that any specific problem that can be solved by a Turing machine can be solved on a sea-of-logic machine as well. The heart of the proof lies in the fact that in order for a Turing machine to solve a problem, it must do so in a finite amount of time. In a finite amount of time, it can only use a finite amount of tape. Thus, it can be reduced to a machine with a finite number of states and thus simulated by the sea of logic. A Turing machine may outperform any particular sea-of-logic machine, however, because it may be able to solve problems that will use an amount of tape that outstrips the memory capacity of any particular sea-of-logic machine.

Turing directly linked his theoretical machine to the more controversial Church-Turing thesis. If one accepts the Church-Turing thesis, then the Turing machine, which is a very simple machine in structure, can solve any cognitive problem. To be more precise, the Turing machine can execute any plan to solve a cognitive problem. Its success in doing so will be a function of the validity of the plan.14 We thus come to the same conclusion that we did with the sea of logic. The Turing machine provides us with a simple and elegant model for the hardware of cybernetic (machine-based) cognition, but not the software. To solve a practical problem, the Turing machine needs a program, and each different program constitutes a different Turing machine.

The universal Turing machine

It turns out that a universal Turing machine can be shown to have the capability of simulating every other Turing machine. The universal Turing machine has a program that can read a description on its tape of any Turing machine (even the universal Turing machine) and then simulate that machine. Thus, the input tape to the universal Turing machine contains not only the usual problem input but also a coded description of the Turing machine it is going to simulate. The universal Turing machine is thereby a single (theoretical) machine that can solve any problem.15

All this accomplishes, however, is to move the software from the built-in program of a Turing machine to a description of such a program on a tape. If we consider the machine program on the tape to be the software, we once again conclude that we have a powerful and elegant model for the hardware underlying intelligent processes, but not the software, at least not yet.

The Recursive Formula and Three Levels of Intelligence

To follow the way, one must find the way and follow it.

Zen koan

The whole point of this sentence is make clear what the whole point of this sentence is.

Douglas R. Hofstadter, Metamagical Themas

“Would you tell me please which way I ought to go from here?” asked Alice.

“That depends a good deal on where you want to get to,” said the Cat.

“I don’t much care where . . .,” said Alice.

“Then it doesn’t matter which way you go,” said the Cat.

“. . . so long as I get somewhere,” Alice added as an explanation.

“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”

Lewis Carroll, Alice in Wonderland

A professor has just finished lecturing at some august university about the origin and structure of the universe, and an old woman in tennis shoes walks up to the lectern. “Excuse me, sir, but you’ve got it all wrong,” she says. “The truth is that the universe is sitting on the back of a huge turtle.” The professor decides to humor her. “Oh, really?” he asks. “Well, tell me, what is the turtle standing on?” The lady has a ready reply: “Oh, it’s standing on another turtle.” The professor asks, “And what is that turtle standing on?” Without hesitation, she says, “Another turtle.” The professor, still game, repeats his question. A look of impatience comes across the woman’s face. She holds up her hand, stopping him in mid-sentence. “Save your breath, sonny,” she says. “It’s turtles all the way down.”

Rolf Landauer, as quoted in Did the Universe Just Happen? by Robert Wright

The Turing Machine provides us with a model of computation that has enabled theoreticians to examine many facets of computers and their capabilities. The sea of logic is an even simpler mechanism that enables us to build very complex machines and methods from a simple NOR gate. We can consider these to be simple formulas that provide insight into computational hardware, but as noted, they do not provide insight into the software, the methods or algorithms, of intelligence.

One of the first attempts to codify intelligence in an algorithm was the recursive method, implemented in both the General Problem Solver and Samuel’s Checker Playing Program.16 Let us examine the recursive formula in the context of the game of chess. We will be returning to the issue of chess several times in this volume because of its importance to AI research. Raj Reddy cites studies of chess as playing the same role in AI that studies of E. Coli play in biology: an ideal laboratory for studying fundamental questions. I will then expand the method to other problems and examine its capabilities and limitations.

The game of chess

Chess is a game of two opponents in which the players take turns adjusting the positions of their pieces on a playing board according to prescribed rules. Of course, many other games can be described in the same way. Certain activities in real life, such as war and business, have similar characteristics. Indeed, chess was originally formulated as a model of war.

To win or to provide the highest probability of winning, one selects the best possible move every time it is one’s turn to move. The question, then, is what the best move is. The recursive method provides the following rule, which, if you follow it carefully, will enable you to play an exceptionally good game of chess: Every time it is your move, select the best move on the assumption your opponent will do the same. At this point, the casual observer will complain that the rule has no content, that it provides no more insight than an impenetrable Zen koan. It appears to simply restate the problem. As I believe will become clear, however, this rule is all that is needed to play an excellent game of chess. If this is so, then we might conclude either that the recursive formula is a powerful and deceptively simple formula for the algorithm (or software) of at least some forms of intelligence, or alternatively, that chess is not an intelligent game, that the game has no content.

Before delving further into the implications of the recursive formula, let us examine how it works. We fashion a program called Move. When it is called, its job is to pick the best move. It is a recursive program in that it is capable of calling itself. Recursive programs are perfectly feasible in modern programming languages. A program can call itself, return answers to itself, and continue as if it had called any other program (or subroutine).

Recursion is a powerful method used extensively in artificial intelligence. It is one of the more valued features of the primary AI programming language LISP.17 A brief discussion of recursion will be worthwhile at this point, and I will illustrate the concept with an example. Consider the definition of the factorial function expressed recursively. Let n! be the factorial of n. The definition of the factorial function is then 1! = 1, and n! = n x (n -1)! As we can see, this definition of factorial uses the concept of factorial in its own definition. This self-referencing is called recursion. Yet the definition is not infinitely circular in that we can determine the value of the factorial of any number from the definition just by repetitively expanding references to factorial.

For example, let us compute 4!. According to the definition, 4! = 4 x 3!. Using the definition again to expand 3!, we get 4! = 4 x 3 x 2! In turn, we expand 2! to get 4! = 4 x 3 x 2 x 1!. The definition gives the 1! directly without reference to itself. We are thus able to fully expand the expression to eliminate all reference to the function in the right-hand part of the expression: 4 = 4 x 3 x 2 x 1.

The power of recursion is that complex procedures or concepts can be expressed in a simple way. A recursive definition differs from a circular definition in that it has an escape from infinite expansion of the recursion. The escape is found in the “terminal” or nonrecursive portion of the definition. In the recursive definition of factorial, the terminal portion of the definition is the factorial of 1.

Recursion, a program calling itself, is accomplished by the program saving information on its current state (including who called it and exactly where it is to return to when finished) on a push down stack. The stack is a repository of information that is organized as a last-in, first-out (LIFO) list so that information is retrieved in the opposite order in which it is put in, rather like a stack of dishes. Each time a program calls another program (whether it is calling itself or another subroutine), information about the state of the program and the place to return to is placed (“pushed”) onto the stack. Every time a program is completed and wants to return to the program that called it (which might be itself), information is retrieved (“popped”) from the stack. This information restores the state of the program

Factorial, an example of recursion.

that called it and indicates where in the calling program to return to, which might be to part of itself. This simple LIFO mechanism assures that all returns are to the right place.

Practical programs cannot call themselves indefinitely. Thus, a practical LIFO stack is of finite length. Another reason that a program cannot continue to call itself without limit is that we would like to solve problems in a finite length of time. Thus, a useful recursive formula or definition requires an escape condition, a set of circumstances that permits us to escape from infinite recursion.

Let us return to our simple recursive rule for playing chess. It starts out by saying that we should select the best move. To do this, we obviously must consider each move, and thus we need to be able to generate a list of all possible moves. We do this by simply applying the rules of chess. While more complicated than some other games, such as checkers, the rules of chess are certainly straightforward and readily programmable.18 There is nothing mysterious about generating the possible moves from any board position. We thus provide a mechanism programmed with the rules of chess to generate our options. To play a different game, we can simply replace this module with another one programmed with the rules of that particular game.

We now have a list of possible moves. We examine each in turn and ask the question, Does this move enable me to win or lose? If the answer is lose, we do not select that move. If the answer is win, we take that move. If more than one move enables us to win, it does not matter which one we take.

The problem now reduces to answering the question, Does this move enable me to win or lose? At this point we note that our winning or losing is affected by what our opponent might do. A reasonable assumption is that our opponent will also choose his best move (and if the opponent fails to do this, that will not be a problem, but an opportunity). We need to anticipate what that move might be, so we use our own ability to select the best move to determine what our opponent is likely to do. We essentially put ourselves in our opponent’s place and pick the best move for our opponent. In this we are following the part of the recursive rule that states, “Select the best move on the assumption that your opponent will do the same.

Our program is now structured as follows. We generate a list of all possible moves allowed by the rules. We examine each possible move in turn. For each move, we generate a hypothetical board representing what the placement of the pieces would be if we were in fact to make this move. We now put ourselves in our opponent’s place and try to determine what his best move would be. How are we to do this? It turns out that we have a program that is designed to do exactly that. It is called Move. Move is, of course, the program we are already in, so this is where the recursion comes in. Move calls itself to determine what our opponent will do. When called to determine the best move for our opponent, Move begins to determine all of the moves that our opponent could make at this point. For each one, it wants to know how its opponent (which is us) would respond and thus again calls Move for each possible move of our opponent to determine what our response to that move would (or should) be.

The program thus keeps calling itself, continuing to expand possible moves and countermoves in an ever expanding tree of possibilities. This process is usually called a minimax search, because we are alternately attempting to minimize our opponent’s ability to win and to maximize our own.19

The next question is, Where does this all end? Let us start with an attempt to play perfect chess. We continue to expand the tree of possible moves and countermoves until each branch results in an end of game. Each end of game provides the answer: win, tie, or lose. Thus, at the furthest point of expansion of moves and countermoves, some moves finally finish the game. If a move results in a win, then we select that move. If there are no win moves, then we settle for a tie move. If there are no win or tie moves, we keep playing anyway in the hope that our opponent will make a mistake (unless we know we are playing a perfect opponent, in which case we may as well give up). These final moves are the terminal nodes of our expansion of moves. Here, instead of continuing to call Move, we can now begin returning from Move calls. As we begin to return from all of the nested Move calls, we have determined the best move at each point (including the best move for our opponent), and so we can finally select the correct move for the current actual board situation.

The above procedure is guaranteed to play a perfect game of chess. This is because chess is a finite game. Interestingly, it is finite only because of the tie rule, which states that repetition of a move results in a tie. If it were not for the tie rule, then chess would be an infinite game (the tree of possible moves and countermoves could expand forever) and we could not be sure of determining the best move within a finite amount of time. Thus, this very simple recursive formula plays not just an excellent, but a perfect, game of chess.20 The most complicated part of actually implementing the recursive formula is generating the allowable moves at each point. Doing so, however, requires only a straightforward codification of the rules. Playing a perfect game of chess is thus no more complicated than understanding the rules.

A recursive expansion of moves in tic-tac-toe.


Courtesy of the Computer Museum, Boston

A Tinkertoy mechanical computer that plays tic-tac-toe.

When shopping for services like car repair, the smart shopper is well advised to ask, How long will it take? The same question is quite appropriate in applying the recursive formula. Unfortunately, with respect to chess, the answer is something like 40 billion years.21 And that is just to make the first move!

We can estimate the length of time by assuming that there are, on average, about eight possible moves for each board situation. If a typical game lasts about 30 moves, we need to consider 830 possible move sequences to fully expand the tree of all move-countermove possibilities. If we assume that we can analyze one billion of such sequences in a second (which is at least 1,000 times faster than is in fact possible on today’s fastest chess computers-in 1989, two leading chess machines, HiTech and Deep Thought, could analyze 175,000 and 1,000,000 board positions per second, respectively), then we require 1018 seconds, or about 40 billion years.22 If the cyclic theory of the universe is correct (the theory that the universe will eventually end its current expansion and then contract, resulting ultimately in another big bang), then the computation would not be complete before our computer is blown up in the next explosion of the universe.

This brings up another aspect of computation theory: the amount of computation required to achieve a result. One branch of computation theory examines the issue of whether a computer (generally using the Turing machine as a model of a computer) can solve a given problem within a finite amount of time. Problems that cannot be solved are called unsolvable problems. More recently a new branch of computation theory called complexity theory has been developed to deal with the issue of how long various solvable problems will take to compute.23 While our very simple recursive formula plays perfect chess, which may indeed be considered intelligent behavior, it takes an excessively long period of time to do so. Accomplishing tasks quickly or at least in a timely manner is regarded as an essential part of intelligence. As discussed earlier, despite controversies with regard to cultural bias, the timed aspect of intelligence tests has long been accepted, which reflects the importance of timeliness as an attribute of intelligence.

Thus far the above discussion has not yet demonstrated intelligence in the recursive formula. Playing perfect chess might be considered intelligent behavior, but not at 40 billion years per move. Before we throw out the recursive formula, however, let us attempt to modify it to take into account our human patience (and mortality). Clearly, we need to put limits on how deep we allow the recursion to take place. How large we allow the move-countermove tree to grow should depend on how much computation we have available. In this way we can use the recursive formula on any computer, from an inexpensive home computer to the largest supercomputer.

The next question is how do we evaluate the terminal “leaves,” the fully expanded board positions of our tree of possibilities? When we considered fully expanding each move sequence to the end of the game, evaluation was simple: winning is better than tying, and tying is better than losing. Evaluating a board position in the middle of a game is not as straightforward and, not surprisingly, is the source of controversy.24

There are two schools of thought here: the simple-minded school and the complex-minded school. The complex-minded school argues that we need sophisticated (complicated) sets of rules and procedures to evaluate the “quality” of the board at each terminal leaf position, that we need to be “smart” in making these evaluations and require algorithms that can make subtle judgements.25

The simple-minded school argues that whatever computational resources we happen to have available are best put into pursuing greater depths of search, that is, looking at longer sequences of possible moves and countermoves and using a computationally simple procedure to evaluate final board positions. One such procedure is simply to count up the points of the two sides, using the recognized value of each piece (ten for the queen, five for each rook, etc.). The argument is that any sophisticated evaluation of node positions is computationally very expensive and is not likely to be more productive or more reliable than putting the computational resource into greater depth of search.

For chess in particular I tend to agree with the simple-minded school and feel that the key to building a computer that can win the world chess championship lies in massive computational power to be achieved through massive parallel processing (which, after all, the human brain uses). All chess playing programs use the recursive formula, some with simple board evaluation procedures (often just adding up piece values) and others with more sophisticated procedures. The primary factor that has determined the power of a particular program appears to be the amount of computation that can be brought to bear. The programs running on the fastest supercomputers or parallel processing networks invariably are the best machine players, regardless of the board-evaluation algorithm. More sophisticated terminal-leaf evaluation procedures do not appear to hurt performance, but neither do they appear to help. They do not appear, therefore, to be worth the trouble.26

HiTech, developed at Carnegie Mellon University and the leading chess machine of 1988, uses massive parallel processing.27 It uses an array of specially designed VLSI chips capable of generating and evaluating board positions. Though its procedure for evaluating a board position is more complicated then just adding up the value of each piece, I would still describe its evaluation procedure as simple (although the developers of HiTech might disagree with me).28

If we agree with the simple-minded school, then the node-evaluation procedure is simple. Since the recursive formula itself is the ultimate in simplicity, we end up with an algorithm that does indeed reduce to a simple formula and is capable of playing a very good game of chess with reasonable (and in tournament play, acceptable) response time.

The next question is, How good a game? Predictions made in 1958 that a computer would be world chess champion by 1968 were considerably over optimistic.29 Nonetheless, computers have demonstrated steady progress and, as mentioned earlier, two computer programs, Belle and HiTech, now play on the National Senior Master Level, equaled or surpassed by only 768 human players.30 In 1988 HiTech scored 5 wins, 1 loss in the National Open Chess Championships, which ranked it 150th among registered American players. Also in 1988 Carnegie Mellon’s Deep Thought defeated the highest rated player ever in the U.S. Open Chess Tournament. Though not yet that of a world champion, this is certainly an impressive and eminently intelligent level of play.31

The number of human chess players that can still defeat the best machines is growing smaller each year. It is clear that to make linear progress in the number of moves and countermoves that can be analyzed, we need to make exponential gains in computational power. On the other hand, we are making exponential gains in computational power with the linear passing of time. By most estimates, computational power (per unit of cost) is doubling every 18 to 24 months. The phenomenon of parallel processing (doing more than one computation at a time) is in some ways accelerating this trend, at least with regard to how much computation we can throw at the problem of playing chess. The Connection Machine, for example, has 65,536 parallel computers, although each of these processors handles only one bit at a time.32 Machines are being built that combine thousands of powerful 32-bit computers, and machines that combine millions of 32-bit microprocessors are being seriously contemplated.

It is clear that it is only a matter of time before the world chess champion is a machine. To make a prediction as to when that might be, we need to know how many more move-countermove expansions are needed. Some argue that the difference between national senior master level, at which computers can now play, and world-champion level of play is the ability to see only two or three additional moves ahead. It certainly makes sense that the ability to see an additional 2 or 3 moves ahead could account for the difference between the excellent playing of a national senior master and the somewhat better game of the champion. Others argue that the difference is more like 10 to 12 moves.33

Some argue that the difference in play between masters at different levels is not a matter of seeing additional moves ahead, that the master is not really analyzing moves and countermoves at all. Instead, he supposedly has some sort of higher-level intuitive feel for the strength of different moves and board positions. In my opinion, master players are not consciously aware of all of the move-countermove sequences, but this computation has in fact taken place. Most of this analysis has been “precomputed” by storing and retrieving board positions from previous experience and study. When we train our minds to perform a certain type of physical or mental activity, the mental processes underlying this skill can often be accomplished without our conscious awareness, but that does not mean that the mental calculations never took place.34

To get back to our estimate, if each additional move ahead expands the amount of computation by a factor of about 8, and if it takes 18 months for the computer industry to double computational power, then our cybernetic chess players will be able to see one additional move ahead every 4.5 years. According to this estimate, we are somewhere between 9 and 54 years from having sufficient computational power to build a computer chess champion. This achievement will, in my opinion, be brought a lot closer than 54 years through the impact of new technologies, including new parallel processing architectures, exotic new computer hardware techniques that use light instead of electricity, and other developments.35 Another analysis is provided by Raj Reddy. He estimates that grand-master level play (just below world-championship level play and probably good enough to beat the world champion once in a while) requires being able reliably to look ahead about 12 to 14 moves. Though the average branching factor (the number of possible moves for each board position) is about 35, advanced search techniques allow us to effectively reduce this number to 5. Thus, we need to consider between 512 and 514 board positions, or about 3 billion positions per move. With an average of 3 minutes of thinking time per move, this requires a rate of about 16 million board positions per second, only 16 times faster than Deep Thought’s 1989 rate. Thus, we should be able to achieve very close to championship play by the early 1990s.

Yet another analysis is provided by one of HiTech’s designers, Hans Berliner, who notes that the average increase in the ratings of the best computer players has been about 45 points per year (in contrast, the average increase in the ratings of the best human players is close to 0). This analysis projects 1998 as the year of the computer world champion. My own guess is that we will see a world champion by the year 2000, but as mentioned above, earlier predictions of this type have not fared well.

Let us return for a moment to the concept of the perfect chess game. From a mathematical point of view, such a game exists. In other words, we can define the perfect chess game as a game played by two opponents each following the recursive formula and fully expanding all move sequences. From a formal point of view, we cannot say that determining the sequence of moves in this perfect game is an unsolvable problem, because it can certainly be determined in a finite amount of time. On the other hand, we can show that it is impossible to determine in any practical length of time. We thus have the interesting situation of being able to easily define a question, to show that the answer exists and is unique, and also to demonstrate conclusively that there is no reasonable way to find it.

Other questions come to mind. For example, does the player who moves first in this hypothetical perfect game win or tie (it is unlikely that the player making the first move would lose)? Again, there is an answer to this question, but it appears that no human (or machine) will ever be able to find it.

An idiot-savant game?

When I stated that I was of the simple-minded school, I was careful to state that this was my opinion specifically with regard to the game of chess. If it is shown that a simple leaf evaluation procedure can defeat any human being at chess (as I believe will happen), then we can conclude either that the recursive formula is indeed a simple formula for at least certain types of intelligent behavior, or that chess is not an intelligent game. We could conclude that computers are better than humans at chess for the same reason that computers are better at creating spreadsheets. In other words, we might conclude that the secret to chess is simply being able to keep in one’s mind countless move and countermove sequences and the resulting board patterns. This is the type of information that computers can easily and accurately manipulate, whereas the heart of human intelligence lies elsewhere. As long as there are at least a few humans that can defeat the best machines, some observers still feel comfortable in citing chess as an example of high intellectual (and creative!) activity and imagining that there is some unknown (and possibly unknowable) deep intellectual process underlying it. A computer world chess champion could radically alter that perception. It is clear that there will be some significant change in perception with regard to chess, computers, ourselves, or all three.

Other games

The recursive formula can be applied to any board game that involves moves and countermoves within prescribed rules. A reasonable question is, How well does it work in practice for games other than chess? Also, what about the simple-minded versus complex-minded controversy?

At one extreme of the spectrum of game complexity there is tic-tac-toe, for which we can quite easily expand all possibilities to end of game and play a perfect game. Many humans, at least those who have spent any time thinking about it, can play a perfect game as well.

Checkers, on the other hand, is sufficiently complex that playing a perfect game is not feasible, although the perfect checkers game can probably be computed before the next big bang occurs. Checkers-playing machines have already come closer than chess machines to defeating the world human champion.36 A computer has already taken the world championship in backgammon: Hans Berliner’s program defeated the human backgammon champion in 1980.37 There are two reasons why computers have had a somewhat easier time with checkers and backgammon than with chess. First, generating the possible moves at each point requires substantially less computation. There are only two different types of pieces in checkers and one in backgammon, and the rules are relatively simple. The problem is not that programmers have a hard time programming the rules of chess-the rules are not that complicated. The complexity of the rules is an issue with regard to how many moves ahead it is possible to look. With their relatively simple rules, the same amount of computational power enables the checkers or backgammon program to look ahead a greater number of moves than the chess program. The second possible reason is that there is a lot more interest in chess, and thus the best human players are probably better at chess than at checkers or backgammon. Nonetheless, backgammon is still considered a “deep” game, and the fact that a computer can defeat any human player should be considered an event of some note.38

At the other extreme, in terms of computer success, is the Japanese game of go, which has a place in the cultural life of Japan comparable to chess in Russia, Europe, and the United States. Though computers can play a reasonable amateur game of go, they have not achieved master level, and the lack of progress is not for lack of effort by program designers. The problem is not with the complexity of go, because the rules are in fact simpler than those for chess. In my view, the primary reason that the recursive formula is not as effective is that the number of possible moves at each point is relatively large and the length of move sequences in a game, or even a portion of a game in which meaningful progress is made, is very long. Thus, even massive computation is quickly exhausted and does not look far enough ahead to examine a meaningful strategy.39

It appears that for the game of go we do need more sophisticated evaluation strategies to examine different options. It may also be that we need to consider alternatives on a different level than individual moves, that we need a way of effectively modeling strategies on a higher level than one move at a time. Thus, for go, unlike chess, I would throw my ballot in with the complex-minded school. It appears that the Japanese have a board game that is “deeper” and more consistent with the particular strengths (at least at present) of human intelligence.40

Other intelligent tasks

The Recursive Formula can be applied to nongame situations, such as proving mathematical theorems, solving combinatorial problems, solving various types of puzzles ranging from mazes to word problems (like the problem of cannibals and missionaries).41 In any situation in which the answer can be expressed as a sequence of moves, steps, or transformations governed by a set of unambiguous rules, the recursive formula is a good candidate for solving the problem.

In 1957 Allen Newell, J. C. Shaw, and Herbert Simon devised a program entitled the General Problem Solver (GPS).42 The program’s name reflected the enthusiasm, romanticism, and immodesty of the early AI field, an optimistic orientation that eventually earned the field considerable criticism.43 GPS could solve problems that involved devising a sequence of steps to achieve a goal. For example, proving a theorem in mathematics requires devising a sequence of steps allowed by the rules of logic, in which each step employs an axiom or a previously proven theorem or lemma. One approach that works in theory but not in practice is simply to generate every possible sequence until one is found that accomplishes the task. Clearly, the number of possibilities expands too quickly, and a method of using limited computational resources to achieve a goal within a reasonable amount of time is needed. To optimize the use of scarce computational power, GPS uses a technique called means-ends analysis. At each step it measures the “distance” to the goal and generates a subgoal to attempt to minimize this distance. It then calls itself, another example of recursion, to solve the presumably simpler problem represented by the subgoal. In a similar way, each subgoal can be expanded into smaller subgoals.

GPS achieved notable success in solving logic puzzles, proving theorems, solving problems in integral calculus and plane geometry, and similar mathematical or combinatorial problems.44 Its success fueled some of the AI field’s early optimism.45 Further progress came more slowly, however, as some of the limitations of the technique became clear. One of the keys to successful use of the GPS paradigm is a measure of distance that is appropriate to the problem. For example, how should we measure the distance between two mathematical expressions? Simply counting

So close and yet so far: the distance between mathematical expressions.

the number of symbols that the two expressions do not have in common is clearly not satisfactory: two expressions may differ by only one symbol, and yet we may never be able to derive one from the other. Similarly, in solving a maze problem, we may be only a single cell away from the goal, yet we are not necessarily close to achieving a viable path. A fully satisfactory method of defining distance in such domains is still not clear.46

The initial rapid success of GPS was followed by a period of considerably slower progress in which each aspect of the GPS paradigm was examined in detail. What is the distance measure? How are subgoals generated and evaluated? Are there alternatives to the means-ends algorithm? A relatively simple approach that seemed at first to capture intelligence, at least within a broad variety of problem domains, was shown to have limitations. GPS has certainly not been the last word on this type of search problem, and more capable methods have since been devised.47

Three levels of intelligence

In practical applications of the recursive formula to a variety of tasks, intelligent problems appear to divide into three levels. At one extreme are problems that require little enough computation that they can be completely analyzed. Examples are tic-tac-toe and such word problems as cannibals and missionaries. In the middle are problems for which we cannot afford full analysis but that appear to be successfully dealt with using a simple node-evaluation procedure. By “successfully dealt with” I do not mean that it is possible to provide perfect solutions. Rather, I mean that it is possible to match the performance of most (and in some cases all) humans. Examples of this class of problems are chess, checkers, backgammon, and Euclidian geometry. It is true that some of the chess programs use complex node-evaluation procedures, but these programs do not appear to provide superior performance to programs that do use simple node-evaluation procedures. The point is that playing a good game of chess (and in my opinion ultimately a championship game of chess) does not require complicated node-evaluation procedures.48 Finally, there is a class of problems that do require nontrivial procedures at the terminal leaf position. For many

So close and yet so far: the distance between cells in a maze.

of these problems we have yet to devise such procedures. Examples are the game of go and some realms of mathematical theory. In sum, the levels of intelligence required for various activities are as follows:

Level 1

  • Tic-Tac-Toe
  • Cannibals and missionaries

Level 2

  • Checkers
  • Chess
  • Backgammon
  • Euclidian geometry

Level 3

  • Go
  • Some realms of mathematics

What is our conclusion with regard to the recursive formula? It is certainly simple and elegant. For the first two classes of problems described above-the class that can be fully analyzed within a reasonable amount of time and the class that cannot be fully analyzed but requires only simple terminal-evaluation procedures to achieve intelligent performance-the full description of the algorithm is simple and can be said to be solved by a unifying formula. The most complicated aspect of implementing such a solution is the generation of the possible moves or steps at each point. Doing so is, however, simply a matter of stating the problem (e.g., the rules of the game). Thus the recursive formula enables us to solve a problem as simply as stating it.

What about the third class, the class for which simple terminal-evaluation procedures do not appear adequate? For this class the basic recursive algorithm is only part of the solution. We end up requiring insight into the unique structure of each such problem. Many of these problems can be solved (and have been solved), but the solutions are different in each case. If there is a unifying formula linking these diverse algorithms, none has yet become apparent.

The extent to which one accepts the recursive formula as a unifying formula for even a subset of intelligent problem solving comes down again to one’s definition of intelligence. It might be argued that problems in the first two classes described above are inherently not problems requiring intelligence but rather just data- and symbol-manipulation problems for which computers are well suited and that truly intelligent problems are of the third kind. Accepting this point of view requires us to modify our opinion of certain activities that have for centuries been considered to be intelligent, like playing checkers or chess and solving problems in Euclidian geometry and other areas of mathematics.

Such a view would be equivalent to stating that intelligence is to be specifically defined as a type of behavior or activity that inherently cannot be captured in a single unifying formula. If we accept this definition, then, of course, the question of whether or not intelligence or any important aspect of it can be captured in a formula is answered negatively by definition. Yet the implications of such a definition would not be consistent with commonly held views of what constitutes intelligent behavior.49

I would say that the recursive formula is partially successful. Certain types of intelligent behavior, that required by the first two classes of problems, can be effectively simulated with this elegant and powerful approach. Other types of problems appear to resist it, including the important area of pattern recognition.

Other Approaches to Modeling the Software of Intelligence: Random Nets, Pandemonium, and Trees

I do not come to the discussion of connectionism as a neutral observer. In fact, the standard version of its history assigns me a role in a romantic story whose fairytale resonances surely contribute at least a little to connectionism’s aura of excitement.

Once upon a time two daughter sciences were born to the new science of cybernetics. One sister was natural, with features inherited from the study of the brain, from the way nature does things. The other was artificial, related from the beginning to the use of computers. Each of the sister sciences tried to build models of intelligence, but from very different materials. The natural sister built models (called neural networks) out of mathematically purified neurones. The artificial sister built her models out of computer programs.

In their first bloom of youth the two were equally successful and equally pursued by suitors from other fields of knowledge. They got on very well together. Their relationship changed in the early sixties when a new monarch appeared, one with the largest coffers ever seen in the kingdom of the sciences: Lord DARPA, the Defense Department’s Advanced Research Projects Agency. The artificial sister grew jealous and was determined to keep for herself the access to Lord DARPA’s research funds. The natural sister would have to be slain.

The bloody work was attempted by two staunch followers of the artificial sister, Marvin Minsky and Seymour Papert, cast in the role of the huntsman sent to slay Snow White and bring back her heart as proof of the deed. Their weapon was not the dagger but the mightier pen, from which came a book-Perceptrons-purporting to prove that neural nets could never fill their promise of building models of mind: only computer programs could do this. Victory seemed assured for the artificial sister. And indeed, for the next decade all the rewards of the kingdom came to her progeny, of which the family of expert systems did best in fame and fortune.

But Snow White was not dead. What Minsky and Papert had shown the world as proof was not the heart of the princess; it was the heart of a pig.

Seymour Papert, 1988

Another unified theory: The Perceptron

In the early and mid 1960s the AI field became enamored with another type of machine called the Perceptron.50 This was an attempt to capture intelligence using a large number of simple mechanisms. A perceptron consisted of three types of devices: sensory units, associative units, and effector units. The sensory units consisted of simple photocells arranged in a two dimensional array, simulating the function of the human retina (or many other forms of input). A Perceptron incorporating auditory sensory units was also proposed. The associative units were randomly wired between the sensory units and the effector units and in some cases other associative units. The associative units contained a feedback mechanism that makes them automatically adjust the amount of current that passes from each input to each output on the basis of feedback received by the unit on whether or not the Perceptron’s last response was correct. The effector units sum the outputs of the associative units to which they are connected, and by comparing different summed voltages, they make final decisions on the identities of the patterns presented to the sensory units.

The machine attracted attention because, simple as it was, it appeared to model both the structure and the functions of the human brain.51 The sensory, associative, and effector units modeled afferent, associative, and efferent neurons respectively. Moreover, the random wiring of the associative units, sometimes referred to as random neural nets, were thought to model the seemingly random interconnections of the associative neurons. The Perceptron learns by gradually

adjusting the electrical potential on each random connection. This was thought to model human learning. That human neural connections were believed to be random was consistent with a theory that most human perceptual ability was learned rather than innate.

It was discovered that such a machine could be “taught” to recognize various shapes, such as printed letters. In 1969 Marvin Minsky and Seymour Papert wrote a book called Perceptrons, which proved a set of theorems showing that Perceptrons could never solve the simple problem of determining whether or not a line drawing is “connected” (in a connected drawing all parts are connected to each other by lines), the so-called contiguity problem.52 It was also shown that their ability to recognize such shapes as printed letters was quite limited: the Perceptron would fail if the letters were tilted, changed in size, or otherwise distorted. We now know that though detailed perceptual categories, such as the shape of the letters in our alphabet, are certainly learned, the properties on which such perception is based-lines, curves, concavities, loops, etc.-are at least partially prewired, and that the neural connections are anything but random.

The book had a dramatic effect, and virtually all work on Perceptrons came to a halt. The severe limitations of the Perceptron’s capabilities came as a disappointment to many. Prior to the publication of Perceptrons, it had appeared to some practitioners that this simple technique had captured the secret of intelligence. All we had to do to make the Perceptron more intelligent was to add more associative units and more wires. Unfortunately, intelligence was not to be captured this easily.53 After the publication of Perceptrons, there was an understandable hesitation in the AI field to look for any single method or even a single paradigm underlying intelligence. The limitations of GPS and the failure of the Perceptron certainly did not prove that all unified approaches to defining and replicating the methods of intelligence would fail, but enthusiasm for discovering such unifying principles waned nonetheless.

It was probably a healthy development for artificial intelligence in the early 1970s that the search for a unified theory of intelligence was put aside at least temporarily. This focused attention instead on devising a variety of approaches for a variety of problems and allowed the development of generalized theories to wait until more application-specific techniques were perfected.54 The implication of this view is that AI is composed of a great variety of methods and approaches, that there is no single set of techniques that solves all AI problems. Complex problems are often reducible to hierarchies of simpler problems. The solutions, then, are manifest in the structure of the hierarchies and the solutions to the simplest problems found at the lowest level of each hierarchy. Such structures and solutions will differ for different classes of problems.

A related observation is that solutions to AI problems are often domain specific. In other words, the methods for solving particular AI problems need to be derived from our understanding of the particulars of that problem. Recognizing printed characters by machine, for example, needs to be concerned more with the details of letter shapes, ink and paper reflectivity, printing errors, and other peculiarities of print than with any generalized techniques of pattern recognition.55 Similarly, recognizing human speech by machine needs to be primarily concerned with the detailed nature of human speech sounds, the syntax of natural language, and the signal-processing characteristics of the transmission channel.

Occasionally claims are made that a “generalized perception algorithm” has been devised that can recognize any type of pattern, whether it be manifested in speech, printed characters, land-terrain maps, or fingerprints.56 On closer examination, it often turns out that such claims are absolutely correct: such algorithms do, in fact, recognize every type of pattern, only they do all these tasks very poorly. To perform any of them well, with satisfactory (and commercially acceptable) rates of accuracy, requires a great deal of knowledge deeply embedded in algorithms specific to the domain of inquiry, so much so that this aspect of the technology outweighs the generic AI techniques.

The new connectionists

A more sophisticated generation of random neural nets is now being developed with important differences from the earlier work. The limitations described in the Minsky and Papert theorems are not applicable to at least some of the work in this latest generation. The “neurons” themselves can be more sophisticated. Each neuron may be modeled by its own computer, which thus provides a high degree of parallel processing. Each neuron model can perform arbitrarily complex transformations on their inputs. Rather than having neural connections fixed at the outset, these neural models are also capable of growing new connections.57

In all fairness, it should be pointed out that some of the recent work differs little from that done over twenty years ago. In many of the recent connectionist systems the nets still have only one or two layers and still use simple linear neuron models. In my opinion, a far more promising avenue of research lies with nets built from more complex neurons organized in multiple layers and each capable of detecting abstract features of patterns (such as lines, curvatures, and concavities).

The earlier random neural nets generated excitement precisely because they appeared to provide a simple formula that could generate complex and purposeful, that is, intelligent, behavior. The earlier work would qualify as a unifying formula, but unfortunately it did not work very well. The more recent work-particularly that which stresses complex neuron models-appears far more promising but does not fully qualify as a unifying formula. It is more of a unifying paradigm in that it still requires diverse approaches to implement and organize the overall system.58

Still the promise is implicit that we can get more intelligence out of less. In other words, while the neuron models still require intelligent programs, the hope is that the entire neural net will exhibit a higher degree of intelligence than is explicitly programmed at the lower neuron level.

Whereas Minsky was obviously opposed to the Perceptron work and expressed concern that limited research resources were being drawn into a dead end, he has expressed cautious interest in random neural nets based on more intelligent neurons.59 It would certainly appear that this work is consistent with Minsky’s recent emphasis on the “society of mind” as a paradigm for intelligence.

Pandemonium selection

For pattern-recognition problems (such as identifying faces, printed characters, land terrains, fingerprints, blood cells, etc.), we need to break down the higher-level decisions (e.g., what letter this is) into lower level ones. If we design “demons” (algorithms) to answer the lower-level questions, we can use their answers to make a final decision. One approach is to invite the demons to a “meeting.” We then ask, Is this letter an A? and determine how loud the demons shout. We go through all of the possibilities in the “identification set” (B, C, etc.) and pick the one that causes the demons to voice the strongest approval. Some of the low-level questions are more important than others, and thus some of the demons are more influential (that is, able to shout louder) than other demons. The demons also control how loud they shout by how sure they are in answering their lower-level question.60

The decision tree

Another way to organize the demons is in a tree. We go to one demon and ask a question, such as Does this letter have a loop (an area of white completely enclosed by black)? If the demon says yes, we continue on to demons on the yes branch of the tree; if no, then we take the no branch. If the answer is yes, then we know the letter could be an A but not a C, and so on. We follow the branches from one demon to another until we get to a leaf, at which point we have an answer. Note that such decision trees differ from the recursively generated trees described for the recursive formula. In recursion, a tree is generated for each decision, and the topology (shape) and content of the tree will differ for different situations. For pattern-recognition tasks, the tree is generally fixed and designed by the program designer. What differs for different situations (inputs) is not the shape of the tree but the path taken through it.61

Both the pandemonium and decision-tree methods can work. The latter is faster (for a serial computer) in that we have only to “compute” demons that we find along our path. But it is more subject to catastrophic failure (an incorrect answer) if a demon, particularly one encountered early on, makes a mistake. With pandemonium selection, an erroneous demon can still be outvoted.

As with the recursive model, there is a question as to how sophisticated these demons should be. At one extreme we have high-level feature extraction, in which the demons answer difficult questions. Examples are Is this region concave? in the context of character recognition and Is this burst of sound the beginning of a plosive phoneme? in the context of speech recognition. At the other extreme we have template matching, in which the demons answer the simplest possible questions, such as Is this particular point black or white? Whereas I opted for the simpleminded school of thought with regard to chess (though not for all games), I feel that more complex demons are necessary for most pattern-recognition tasks. In chess we are dealing with a very orderly and artificial world that can be described with essentially perfect precision. In pattern recognition we are dealing with more complex and more variable real-world phenomena that inherently require higher levels of abstraction.

High-level demons can be broken down into hierarchies of simpler demons. It is thus possible to develop decision trees with simple demons at the terminal nodes that can detect more abstract or higher-level features at higher nodes of the tree. However, if the demons are simple, then the heart of the technology lies in the decision hierarchy itself.

Thus, as with the random neural net; pandemonium selection, decision trees, and a variety of other methods provide powerful paradigms but are not simple formulas that can be expanded into intelligent behavior through the application of computational power alone. This is in contrast to the recursive formula, in which a simple formula combined with large doses of computational brute force is successful in solving at least certain classes of intelligent problems.

Demons sitting in a decision tree.

The hierarchy of intelligence

What we often regard as intelligent behavior is in actuality many processes operating in parallel and interacting on different levels of a hierarchy.62 If we consider carefully the process of reading and understanding printed language, it becomes clear that we are dealing with a multiplicity of talents: at the character level, pattern recognition; at the word level, the syntactic analysis of word sequences; and at higher levels, the decoding of semantics, the retrieval and updating of our knowledge base, and an understanding of “scripts” about the subject matter being written about, to name only a few of the intelligent processes involved.63

This leads us to a possible recursive definition of intelligence:

An intelligent process is an association of intelligent and unintelligent processes communicating and influencing each other.

This definition is self-referencing (that is, it contains the concept being defined in the definition), but it is not infinitely circular. It again employs the special type of self-reference we call recursion.

Let us examine this “association” definition of intelligence in the context of recursion. If we consider an intelligence to be an association of other intelligences, we replace one intelligence with a larger number of intelligences. We now have an association of intelligences, each of which is in turn expanded into an organization of yet more intelligences. If we keep applying the recursive definition, we end up with a vision of a great profusion of intelligences organized in a hierarchy of many levels. If there were no escape or terminal portion of the definition, we would replace one intelligence with an infinite number of intelligences-a satisfying mystical vision perhaps, but of no value in building practical theories of intelligence.

The escape from infinite recursion is the concept of unintelligent processes, the mechanisms at the lowest level. Intelligence overall results from the coherency of the entire hierarchy of increasingly sophisticated associations of intelligence.

Minsky’s “society of mind”

A recent and perhaps the most comprehensive exposition of this type of theory has been offered by Minsky in writings from 1977 to the present.64 “How can intelligence emerge from non-intelligence? . . . One can build a mind from many little parts, each mindless by itself. I’ll call “Society of Mind” this scheme in which a mind is made of many smaller processes. We’ll call them agents. Each agent by itself can only do some simple thing which needs no mind or thought at all. Yet where we join those agents in societies-in certain very special ways-that leads to true intelligence.”65

How do the various minds in a society interact? The short answer is, In many different ways. Consider the varied ways we interact with each other in human society. Relationships include symmetric bonds, such as lovers and friends, as well as asymmetric ones, such as parent-child, boss-subordinate, and oppressor-victim. Methods of making decisions range from negotiation of equal or nearly equal parties and the influence of one person on another to the dictates of boss to subordinate, among the multifarious ways in which we interact and make decisions in a society.

On a larger level, communication and decision making is regulated by the rules of the society with political, cultural, and social institutions governing the roles of each member.

Within a single individual we also have many mental processes carrying on their operations in parallel. Incoming information is evaluated not by a single “department,” but by many of our subminds at the same time. Some deal with fears; others with hopes, desires, and needs. Yet others deal with the practical aspects of manipulating muscles or performing pattern-recognition tasks. Through a hierarchy of negotiations at different levels, our thoughts are translated into actions. Some subminds are more powerful than others, and the methods of negotiation vary according to the level and the subject matter.66

Such theories as the society of mind are essentially drawing an analogy of the intelligence of one person to that of a society such as a nation with its competing factions, differing views of events, and varying methods of resolving conflicts and pooling opinions. This type of theory is consistent with the complexity of human thought, with its predictable and unpredictable sides, and with the everyday phenomenon of mixed feelings.

We can extend the concept of society beyond the individual. A human being is a society of minds representing needs, desires, and capabilities on many different levels. An actual human society then continues the process of an intelligent hierarchy on yet more levels. Some of the methods of negotiation, influence, and problem resolution in a society are not that different from those taking place within a single individual.

As we expand each intelligence into a society of intelligences, are the intelligences on the lower level less sophisticated and less capable than the intelligence represented by the entire society? I would say that it is certainly desirable that the society be more intelligent than its members. This is not always the case in human society-the intelligence of a mob is a testament to this. Yet successful human organizations are obviously capable of accomplishing intellectual tasks that their individual members could never accomplish alone. Within a single human mind a well-integrated personality is capable of synthesizing its constituent thought processes in a manner that exhibits greater intelligence at each higher level of the mental hierarchy. As examples of poor integration, many forms of mental illness are characterized by more primitive thought processes interfering with and even taking charge of the more subtle and intelligent capabilities.

Inherent in this vision of intelligence is the inference that human intelligence, though very complex, is not infinitely complex. It can ultimately be understood in terms of simpler processes. At the lowest level there are very simple processes, although the hierarchy that leads to the simple processes is anything but simple. We do not fully understand the hierarchy-indeed, we understand very little of it-but there is no reason that we cannot progressively increase our knowledge of it.

Lettvin’s “society of neurons”

Another theory of human intelligence as a society of intelligences, one articulated by Jerome Lettvin of MIT, views individual neurons as small “minds” organized in a society more complex than any human society (there are tens or hundreds of billions of neurons in the human brain and only billions of human beings on the earth). When first articulated in the mid 1960s, this view differed from the conventional wisdom that neurons were just simple switches with very limited information-processing capability.67 As Lettvin surmised, we now know that neurons are indeed quite complex and that communication between neurons does use a “language” that, though not nearly as sophisticated as human language, is far from trivial in its structure. Lettvin imagined the human mind as a hierarchy of larger and larger societies of neurons with languages, cultures, and power struggles.68 Anyone who doubts that a single cell can be intelligent needs only consider the macrophage. These killer white blood cells are capable of conducting subtle pattern-recognition tasks to identify hostile bacteria, parasites, and other agents of disease and of carrying out elaborate tactics to destroy the enemies once identified. There is no reason to suppose that nerve cells, which are structurally more complex than macrophages, have less sophisticated capabilities.

Intelligence as an association of lesser intelligences is both a theory of natural intelligence and a paradigm for the organization of machine intelligence. As our computer systems become more complicated, one way to organize them is to develop smaller systems, each of which is relatively complete with its own structural integrity, and then define languages and processes that let these lesser “experts” interact. This type of organization is becoming increasingly popular in both applied and research AI systems.

A society of human and machine intelligence

An example of an integrated “society of intelligences” is the set of expert systems already developed or being developed by Digital Equipment Corporation (DEC). One of the first practical expert systems was XCON, which uses over ten thousand rules to design the configuration and assembly process for their VAX-series computers.69 XCON has been found to be substantially more reliable than human experts and is now routinely used on almost all orders for VAX computers. With the success of XCON, DEC is now developing expert systems for virtually all aspects of its company’s operations, including computer-assisted design, manufacturing control, material-resource planning, sales strategies, financial planning, and other areas.70 When fully operational, expert systems will be playing critical roles in every aspect of DEC’s operations, communicating with each other as well as with DEC employees through an extensive computer-based network. This is expected to be one of the first human organizations operating in a real-world environment to combine a society of multiple human and machine intelligences.71

Progress on different levels

If we consider the diverse problems that are being attacked by AI technology, and the wide range of intelligent behavior, pattern recognition, and problem solving abilities of human intelligence, we must recognize that a variety of approaches is needed.72 Progress in harnessing machine intelligence will likely result from all of the following:

  • More powerful “unifying” formulas for solving classes of intelligent problems
  • More powerful paradigms
  • Highly detailed technologies with deeply imbedded domain-specific knowledge for interacting with real world problems
  • More powerful computational (hardware) resources
  • More powerful (and generally parallel) architectures

With regard to the last item, we are increasingly finding that different problems require different hardware architectures and that architecture design is becoming as important as algorithm design.

Is there a formula for intelligence?

The answer is that there are indeed formulas for both the “hardware” and the “software” of intelligence that are elegant and far reaching in their implications. These formulas provide powerful insights into the nature of problems requiring intelligence and their solutions. We must also conclude, however, that these simple models of intelligence are only the beginning, not the end, in our search for intelligence. Real-world intelligence is a combination of elegant insight with extensive knowledge and experience that can be accumulated only through painstaking and often hard-won effort.73

The Formula of Life as a Formula of Intelligence

A great deal of the universe does not need any explanation. Elephants, for instance. Once molecules have learnt to compete and to create other molecules in their own image, elephants, and things resembling elephants, will in due course be found roaming through the countryside.

Peter Atkins

It is raining DNA outside. On the bank of the Oxford canal at the bottom of my garden is a large willow tree, and it is pumping downy seeds into the air . . . . The DNA content must be a small proportion of the total, so why did I say that it was raining DNA rather than cellulose? The answer is that it is the DNA that matters. The cellulose fluff, although more bulky, is just a parachute, to be discarded. The whole performance, cotton wool, catkins, tree and all, is in aid of one thing and one thing only, the spreading of DNA around the countryside. Not just any DNA, but DNA whose coded characters spell out specific instructions for building willow trees that will shed a new generation of downy seeds. Those fluffy specks are, literally, spreading instructions for making themselves. They are there because their ancestors succeeded in doing the same. It is raining instructions out there; it’s raining programs; it’s raining tree-growing, fluff-spreading, algorithms. That is not a metaphor, it is the plain truth. It couldn’t be any plainer if it were raining floppy discs.

Richard Dawkins

One might include DNA and its support cast of “data processing” chemicals as an example of a classical unifying formula.74 It can also be regarded as a unifying formula for intelligence, because it provides the foundation for human life, which is our primary example of intelligence. Let us briefly examine the mechanics of DNA and then compare it to some of the unifying formulas described above for machine intelligence.

The formula of life

The chemical structure of the DNA molecule was first described by J. D. Watson and F. H. C. Crick in 1953 as a double helix consisting of a pair of strands of polynucleotides with information encoded at each position by the choice of nucleotides.75 Since then biochemists have gone on to map extensive sections (but still a small fraction) of the genetic code and to understand the detailed chemistry of the communication and control process by which DNA commands reproduction through such other complex molecules and cellular structures as messenger RNA, transfer RNA, and ribosomes. At the level of information storage, the mechanism is surprisingly simple. Supported by a twisting sugar-phosphate backbone, the DNA molecule contains up to several million rungs, each of which is coded with one letter drawn from a four-letter alphabet. The alphabet consists of the four base pairs: adenine-thymine, thymine-adenine, cytosine-guanine, and guanine-cytosine. The DNA strings in a single cell would measure up to six feet in length if stretched out, but an elaborate packing method coils it to fit into a cell only 1/2500 of an inch across. With four letters in the alphabet, each rung is coding two bits of data in a digital code. Special enzymes can copy this information by splitting each base pair and assembling two identical DNA molecules by rematching the broken base pairs. Other enzymes actually check the validity of the copy by checking the integrity of the base-pair matching. With these copying and validation steps, this chemical data-processing system makes only about one error in a billion base-pair replications. Further redundancy and error-correction codes are built into the digital data itself, and so meaningful mutations resulting from base-pair replication errors are rare. Most of the errors resulting from the one-in-a-billion error rate will result in the equivalent of a “parity” error, an error that can be detected and corrected by other levels of the system, which will prevent the incorrect bit from causing any significant damage.76

In a process technically called translation, another series of chemicals translate this elaborate digitalprogram into action by building proteins. It is the protein chains that give each cell its structure, behavior, and intelligence. Special enzymes unwind a region of DNA for building a particular protein. A strand of messenger RNA (mRNA) is created by copying the exposed sequence of bases. The mRNA essentially has a copy of a portion of the DNA letter sequence. The mRNA travels out of the nucleus and into the cell body. The mRNA codes are then read by a structure in the cells called a ribosome, which acts like a tape-recorder head “reading” the sequence of data encoded in the mRNA base sequence. The “letters” (bases) are grouped into words of three letters each called codons, with a codon for each of 20 possible amino acids, the basic building blocks of protein. A ribosome reads the codons from the mRNA and then, using another set of molecules called transfer RNA (tRNA), assembles a protein chain one amino acid at a time. It is now the job of the assembled proteins to carry out the functions of the cell (and by extension the organism). A molecule of hemoglobin, for example, which has the job of carrying oxygen from the lungs to body tissues, is created 500 trillion times each second. With over 500 amino acids in each molecule of hemoglobin, that comes to 15 x 1018 “read” operations every minute by the ribosomes just for the creation of hemoglobin.

In some ways the biochemical mechanism of life is remarkably complex and intricate. In other ways it is remarkably simple. Only four base pairs provide the digital storage for all of the complexity of all human life and all other life as we know it. The ribosomes build protein chains by grouping together triplets of base pairs to select sequences from only twenty amino acids. These protein chains then control everything else: the structure of bone cells, the ability of muscle cells to flex and act in concert with other muscle cells, all of the complex biochemical interactions that take place in the blood stream, and, of course, the structure and functioning of the brain.77