THE AGE OF INTELLIGENT MACHINES | Chapter 6: Electronic Roots

September 24, 2001
Author:
Ray Kurzweil

The First Computer

The German aircraft is the best in the world. I cannot see what we could possibly calculate to improve on.

A German officer explaining to Konrad Zuse why the Third Reich would provide no further support for development of Zuse’s Z series of computers

I won’t say that what Turing did made us win the war, but I dare say we might have lost it without him.

I. J. Good, assistant to Turing

I remember when we first got pulses circulating rather precariously. Every time we switched on the light in the room, another pulse went in and that was the only input that in fact we had. Then later we made a half-adder work and saw our binary pulses counting up on an oscilloscope screen. Then we got the accumulator working and then the whole arithmetic unit, and so on.

We were a small, informal group in those days and whenever we had anything to celebrate, we adjourned to the local pub. It was a pub well known to generations of workers in the Cambridge University Laboratories. It was known as the Bun Shop, though I think they’d be very surprised indeed if you tried to buy a bun there.

We had a special celebration on the 6th of May, 1949, when the EDSAC did its first calculation, which was a table of squares, and a few days later it did a table of primes.

Sam Alexander has reminded me about the first substantial program I wrote which was for computing Airy functions by solving the Airy differential equation, and in connection with that, I made a discovery. I discovered debugging. Now you probably won’t believe me and I don’t know whether my colleagues around this table had the same experience, but it just had not occurred to me that there was going to be any difficulty about getting programs working. And it was with somewhat of a shock that I realized that for the rest of my life I was going to spend a good deal of my time finding mistakes that I had made myself in programs.

Maurice V. Wilkes

Unlike such other epoch-making inventions as the telephone or the electric lightbulb, most people cannot recite who invented the digital computer. One reason is that who invented the computer is a matter of how one defines “computer.” Another is that by most accepted definitions the first functioning computers were developed independently and virtually at the same time in three different countries, one of which was at war with the other two.

It is clear, however, that during World War II the computer was an idea whose time had come. The theoretical foundations had been laid decades earlier by Babbage, Boole, Russell, and others, and many design principles had been worked out by Babbage, Burroughs, Hollerith, and their peers.1 With the advent of reliable electromechanical relays for telecommunications and electronic (vacuum) tubes for radio communication, the building blocks for a practical computer had finally become available.

The German Z machines

It appears that the world’s first fully programmable digital computer was invented in Germany during the Third Reich by a civil engineer named Konrad Zuse.2 Zuse’s original motivation was to automate what he later called those “awful calculations” required of civil engineers.3 He saw the computational bottleneck as bringing his field of applied technology to a virtual standstill. With little theoretical background (Zuse was unaware of the work of Babbage and other developments that were influential in precomputer history) and great skill as a tinkerer, Zuse set out to ease the unenviable task of calculating the huge tables of numbers required for civil engineering projects.

His first device, the Z-1 (the names of Zuse’s inventions originally used the letter V, but this was later changed so as not to cause confusion between his machines and the German rocket bombs), was an entirely mechanical calculator built from an erector set in his parents’ living room. The Z-2 was a more sophisticated machine using electromechanical relays, and though not programmable, it was capable of solving complex simultaneous equations. The Z-2 attracted the attention of military aircraft developers, and Zuse received some level of support for further development, although this was informal and not at a high level.

Zuse’s most significant invention is the Z-3, the world’s first programmable digital computer, completed in late 1941. It had a memory of 1,408 bits organized as 64 words of 22 bits each. In addition to 1,408 relays to support the random access memory, it used another 1,200 relays for the central processing unit. As with all of the relay-based computers, it was quite slow: a multiplication took more than 3 seconds.

Besides its importance as a fully functioning program-controlled computer, it employed a number of other innovations. Zuse reinvented reverse Polish notation, a method of expressing any formula no matter how complex by successive pairs of numbers and operators, originally developed two decades earlier by the Polish mathematician Jan Lukasiewicz.4 It also introduced floating-point numbers, a form of notation that can represent a broad range of magnitudes. Zuse went on to develop in 1945 the first high-level language, called Plankalkul, which foreshadowed modern programming languages such as C in its lack of a GOTO statement and structured methodology. The first programmer of the Z-3, and thereby the world’s first programmer of an operational programmable computer, was August Fast, a man chosen by Zuse for both his talent as a mathematician and his blindness. Zuse apparently reasoned that being blind, Fast would not be called for military service!

A variation of Zuse’s Z-3 was apparently used by the Germans in the design and manufacture of one of the Nazi’s flying bombs, the HS-293, although Zuse has denied this.5 After the war British Intelligence classified Zuse as an “ardent Nazi,” which he has not convincingly denied.6 What is perhaps most interesting about the Zuse computers is the lack of importance given to them by the Nazi war machine. The German military gave immensely high priority to several advanced technologies, such as rocketry and atomic weapons, yet they seem to have put no priority at all on computers. While Zuse received some incidental support and his machines played a minor military role, there was little if any awareness of computation and its military significance by the German leadership. The potential for using computers for a broad range of military calculations from plotting missile trajectories and weapon designs to decrypting intelligence messages seems to have escaped Nazi attention, despite the fact that the technology was first developed under their aegis.7 The motivation for developing the world’s first programmable computer came primarily from Zuse’s own intense focus as an inventor.8

Nor did anyone in other countries pay much attention to Zuse’s work. Credit for the world’s first programmable computer is often given to Howard Aiken, despite the fact that his Mark I was not operational until nearly three years after the Z-3. Since the computer industry developed largely in the United States after the war, one might expect hesitation to recognize a German, an accused Nazi, as the inventor of the programmable computer. On the other hand, allied pride did not stop the United States from harnessing German know-how in the development of the rocket, nuclear weapons, or atomic energy. Zuse, on the other hand, was largely ignored by the Americans, just as he had been by the Germans: both IBM and Remington Rand turned down offers of assistance and technology rights from Zuse after the war. He started his own company in Germany, which continued to build relay-based Z-series machines, although never in large quantities. The last, the Z-11, is still being used.9

Ultra

Unlike the Germans, the British did recognize the military value of automatic computation, at least for the decryption of intelligence messages. As recounted earlier, the ability of the machines built by Alan Turing and his associates to provide a constant stream of decoded German military messages was instrumental in turning the tide of war. The size of the effort, code-named Ultra, consisting of almost 10,000 men and women, is a testament to the strategic priority given to computers by the British during the war.10 Ultra created two series of machines. Robinson, completed in early 1940 and based on electromechanical relay technology, was powerful enough to decode messages from the Germans’ first-generation Enigma enciphering machine. Was Robinson, completed in early 1940, the world’s first operational computer?

According to our definition, Robinson was a computer, although not a programmable one: it had a single hard-wired program. We can consider it, therefore, to be the world’s first operational computer. When the Germans increased the complexity of their Enigma machine by adding additional coding rotors, Robinson was no longer fast enough, and the Ultra team set out to build a computer using electronic tubes, which were a hundred to a thousand times faster than the relays used in Robinson. The new machine, called Colossus, required 1500 tubes, a significant technical challenge in view of the short life span and lack of reliability of vacuum tubes.11 Colossus worked reliably nonetheless and was able to keep up with the increasing complexity of the German messages. The Ultra group considered Colossus to be the world’s first electronic computer, although they were unaware of the earlier efforts of John Atanasoff, an obscure American inventor.

The Americans

Working for five years, a team of Harvard and IBM scientists led by Howard Aiken, a navy commander, completed in 1944 what they thought was the world’s first programmable computer.12 Many still consider their creation, the Mark I, the first general-purpose computer, despite Zuse’s having built such a machine, still awkward, but three years earlier. The Americans were obviously unaware of Zuse (as we have seen, the Germans were hardly aware of him), and they were only dimly aware of the British efforts. The American high command did know that the British were using electromechanical and electronic equipment to decode German messages, but did not have details of the process. Turing visited the United States in 1942 and apparently talked with John von Neumann (1903-1957), a mathematician, but had little impact on the American development of the computer until after the war, at which time there was extensive American-English discussion on the subject of computation.13 The formal name for the Mark I, the IBM Automatic Sequence Controlled Calculator, is reminiscent of the original names of other pivotal inventions, such as “wireless telephone” for the radio and “horseless carriage” for the automobile.14

The United States Navy made Aiken available to the Mark I project, as they realized the potential value of an automatic calculator to a wide range of military problems. Realizing the potential commercial value of the machine, IBM provided virtually all of the funding, $500,000. Harvard’s motivation was to transform the tabulating machine into a device that could perform scientific calculations. The resulting machine was enormous: fifty feet long and eight feet high. It was affectionately called “the monster” by one of its first programmers, Navy Captain Grace Murray Hopper.15

The machine used decimal notation and represented numbers with 23 digits. Input data was read from punched cards, and the output was either punched onto cards or typed. The program was read from punched paper tape and was not stored but rather executed as read.

Grace Murray Hopper has sometimes been called the Ada Lovelace of the Mark I. Just as Ada Lovelace pioneered the programming of Charles Babbage’s Analytical Engine (despite the fact that the Analytical Engine never ran), Captain Hopper was the moving force in harnessing the power of the Mark I. On assignment from the Navy and having lost her husband in the war, she devoted most of her waking hours to programming the Mark I and its successors, the Mark II and Mark III. She was one of the first to recognize the value of libraries of subroutines, is credited with having written the first high-level language compiler, and led the effort to develop the Common Business-Oriented Language (COBOL), the first language not identified with a particular manufacturer.16 She is also associated with the origin of the term “debug.” One problem with the Mark I was fixed by removing a moth that had died inside one of its relays. “From then on,” she recalls, “whenever anything went wrong with the computer, we said it had bugs in it. If anyone asked if we were accomplishing anything, we replied that we were ‘debugging’.”17

The Mark I generated considerable interest in many quarters, but a severe limitation soon became apparent. With its electromechanical relay technology, it was capable of only a few calculations each second. The only real solution was to build a programmable computer without moving parts, which was accomplished when two professors at the University of Pennsylvania’s Moore School of Electrical Engineering, J. Presper Eckert, Jr. (b. 1902) and John W. Mauchly (1907-1980), completed work on their famous ENIAC (Electronic Numerical Integrator and Computer) in 1946. Using 18,000 vacuum tubes (12 times as many as Colossus, which had been considered ambitious only five years earlier!), it was capable of 5,000 calculations per second, making it almost a thousand times faster than Aiken’s Mark I. Early press coverage included reports of the ENIAC accomplishing in a matter of hours what would have required a year for a hundred scientists. It was from these reports that terms such as “super brain” and “electronic genius” first became associated with computers. Conceived in 1943, the ENIAC was originally intended for military applications; however, it was not completed in time for service in World War II. It ultimately served a wide variety of both military and civilian purposes, including physics studies, designing atomic weapons, numerical analysis, weather prediction, and product design.18

ENIAC was undoubtably the world’s first fully electronic general-purpose (programmable) digital computer, although this claim, as well as the validity of the patents for ENIAC, was challenged in 1973 by John V. Atanasoff (b. 1903), a professor at Iowa State College. Judge Earl Larson found in favor of Atanasoff, stating that “Eckert and Mauchly did not invent the automatic electronic digital computer, but rather derived the subject matter from Atanasoff.”19 Working with a part-time graduate student Clifford Berry, Atanasoff did indeed build an electronic computer, called ABC (for Atanasoff-Berry Computer), in 1940.20 Although a relatively unsophisticated device with only 800 tubes, it could compute simultaneous linear equations using binary representations of the variables. It was revealed during the trial that Mauchly had visited Atanasoff while the ENIAC was being designed. The judge, however, misunderstood the distinguishing feature and primary claim of the ENIAC, which was its ability to be programmed-Atanasoff’s ABC was not programmable. Fortunately, despite the court’s finding, credit for building the first computer that was both programmable and electronic continues properly to be given to Eckert and Mauchly. This is a prime example of the challenges faced by the courts in adjudicating issues with substantial technical content, particularly at times when the issues have yet to be fully understood by the technical community.

The ENIAC represented the first time that the public became fascinated with the potential for machines to perform mental functions at “astonishing” speeds. Although many of today’s computers are a thousand times faster than ENIAC (and our fastest supercomputers are almost a million times faster), the speed problem exhibited by the Mark I seemed to have been solved by ENIAC. Yet this led to yet another bottleneck. Loading a program into the ENIAC involved setting 6,000 switches and connecting hundreds of cables. The incredible tedium required to change the program gave birth to a new concept called the stored program and to an architecture that computers still use today.21 The stored program concept is generally associated with the great Hungarian-American mathematician John von Neumann, and we still refer to a computer with a single central processing unit (the primary computational engine of a computer) accessing its program from the same memory (or same type of memory) that holds the data to be manipulated as a von Neumann machine. Although the first modern paper on the subject, published in 1946, carried only von Neumann’s name, the idea was not his alone. It clearly resulted from discussions that von Neumann had with both Eckert and Mauchly. In fact, the first reference to the stored program concept can be found in Babbage’s writings a century earlier, although his Analytical Engine did not employ the idea.

The idea of a stored program is deceptively simple: the program is stored in the machine’s memory in the same way that data is stored.22 In this way the program can be changed as easily as reading in new data. Indeed, the first stored program computers read in their programs from the same punched-card readers used to read in the data. A stored program is more than a convenience, it supports certain types of algorithmic methods that would otherwise be impossible, including the use of subroutines, self-modifying code, and recursion, all of which are essential capabilities for programming most AI applications. It is hard to imagine writing a chess program, for example, that does not make use of recursion.

The Von Neumann machine and the stored program.

Eckert and Mauchly set out to implement the stored-program concept in a computer to be called EDVAC (Electronic Discrete Variable Automatic Computer) while they were still at the Moore School. They subsequently left Moore to start their own company, the Eckert-Mauchly Computer Corporation, and the completion of EDVAC was left to others. This delayed the project, and it was not until 1951 that EDVAC was completed. As a result, though EDVAC was the first computer designed to incorporate stored-program capability, it was not the first to be completed. Eckert and Mauchly’s new company brought out BINAC (Binary Automatic Computer) in 1949. BINAC was not the world’s first stored-program computer, either. Maurice Wilkes, a professor at Oxford University, had taken a course on computers with Eckert and Mauchly while they were still teaching at the Moore School. Returning to England after completion of the course, he set out to build his own computer incorporating the stored-program concept that he had learned from Eckert and Mauchly’s course. Wilkes’s machine, called EDSAC (Electronic Delay Storage Automatic Computer), was completed a few months before BINAC, making it the world’s first operational stored-program computer.23

The commercialization of computation

After building only two BINACs and without adequate financial backing to commercially develop a complex new technology, Eckert and Mauchly sold their firm to Remington Rand in 1950. By 1951 a computer called UNIVAC (Universal Automatic Computer), which Eckert and Mauchly had started several years earlier, was introduced as the world’s first commercially marketed electronic computer. The first customer, appropriately enough, was the U.S. Census Bureau; the 1950 census was the first to be handled by a programmable computer.

Legend has it that Remington-Rand conducted a marketing study for their new product, which indicated a potential worldwide market for only 50 computers. This story is difficult to confirm, but it is true that UNIVAC was not a commercial success. IBM, however, took the introduction of UNIVAC very seriously, viewing it as a threat to their dominant position in tabulating equipment, and shortly thereafter introduced the first of their 700 series of computers (the 701, 702, 704, and 705). The 700 series became so popular that IBM was quickly positioned to dominate the still nascent industry. The 700 series led to IBM’s 1400 series (1401 and 1410) and 7000 series (7030, 7040, 7070, 7074, 709T for transistors, 7090, and 7094), which became standards for large corporate customers and scientific installations. With the introduction of IBM’s 360 series in 1964, a computer architecture based on 32-bit words that continues to form the basis for IBM mainframes, IBM solidified its leadership of the computer industry and became one of the largest and most profitable corporations in the world.24

Firsts in Computers
Name Inventor Completed Sponsor Hardware Program-mable Stored program The first…
AnalyticalEngine Babbage Conceived in 1835, never completed Originally the British gov’t although support was withdrawn All mechanical Yes (punched cards) No First programmable computer ever designed.
Robinson Turing et al. Early 1940 British gov’t Relays No No First operational computer (special purpose)
ABC(Atanasoff-Berry) Atanasoff 1940 Iowa State Research Council Tubes & Capacitators No No First electronic computer (special purpose)
Z-3(Zuse-3) Zuse Late 1941 Partially sponsored by the German Aircraft Research Institute Relays Yes No First programmable computer (actually built)
Colossus Turing et al. 1943 British gov’t Tubes No No First English electronic computer
Mark I(IBM Automatic Sequence Controlled Calculator) Aiken 1944 IBM Mostly relays Yes(punched tape) No First American programmable computer
ENIAC (Electronic Numerical Integrator and computer) Eckert & Mauchly 1946 U.S. Army Tubes Yes (patch cords & switches) No First electronic programmable computer
EDSAC(Electronic Delay Storage Automatic Computer) Wilkes (based on information from a course given by Eckert & Mauchly) Early 1949 Cambridge University Tubes for logic, mercury delay lines for memory Yes Yes First stored-program computer
BINAC(Binary Automatic Computer) Eckert & Mauchly 1949 Eckert-Mauchly Computer Corp. Tubes Yes Yes First American stored-program computer
EDVAC(Electronic Discrete Variable Automatic Computer) Begun by Eckert, Mauchly, and von Neumann, completed by others at Moore School 1951 Moore School Tubes Yes Yes First stored-program computer conceived
UNIVAC I(Universal Automatic Computer) Eckert & Mauchly 1951 Remington Rand Tubes Yes Yes First commercially produced electronic computer
IBM 701 Nathaniel Rochester 1952 IBM Tubes Yes Yes First commercially successful computer

Welcoming a New Form of Intelligence on Earth: The AI Movement

Since Leibniz there has perhaps been no man who has had a full command of all the intellectual activity of his day…. There are fields of scientific work…. which have been explored from the different sides of pure mathematics, statistics, electrical engineering and neurophysiology; in which every single notion receives a separate name from each group, and in which important work has been triplicated or quadruplicated, while still other important work is delayed by the unavailability in one field of results that may have already become classical in the next field.

Norbert Wiener, Cybernetics

Humans are okay. I’m glad to be one. I like them in general, but they’re only human…. Humans aren’t the best ditch diggers in the world, machines are. And humans can’t lift as much as a crane…. It doesn’t make me feel bad. There were people whose thing in life was completely physical-John Henry and the steam hammer. Now we’re up against the intellectual steam hammer…. So the intellectuals are threatened, but they needn’t be…. The mere idea that we have to be the best in the universe is kind of far fetched. We certainly aren’t physically.

There are three events of equal importance…. Event one is the creation of the universe. It’s a fairly important event. Event two is the appearance of life. Life is a kind of organizing principle which one might argue against if one didn’t understand enough-it shouldn’t or couldn’t happen on thermodynamic grounds…. And third, there’s the appearance of artificial intelligence.

Edward Fredkin

There are three great philosophical questions. What is life? What is conciousness and thinking and memory and all that? And how does the universe work? The informational viewpoint encompasses all three…. What I’m saying is that at the most basic level of complexity an information process runs what we think of as physics. At the much higher level of complexity life, DNA-you know, the biochemical functions-are controlled by a digital information process. Then, at another level, our thought processes are basically information processing….

I find the supporting evidence for my beliefs in ten thousand different places, and to me it’s just totally overwelming. It’s like there’s an animal I want to find. I’ve found his footprints. I’ve found his droppings. I’ve found the half-chewed food. I find pieces of his fur, and so on. In every case it fits one kind of animal, and it’s not like any animal anyone’s ever seen. People say, Where is this animal? I say, Well, he was here, he’s about this big, this that and the other. And I know a thousand things about him. I don’t have him in hand, but I know he’s there…. What I see is so compelling that it can’t be a creature of my imagination.

Edward Fredkin, as quoted in Did the Universe Just Happen by Robert Wright

Fredkin…. is talking about an interesting characteristic of some computer programs, including many cellular automata: there is no shortcut to finding out what they will lead to. This, indeed, is a basic difference between the “analytical” approach associated with traditional mathematics, including differential equations, and the “computational” approach associated with algorithms. You can predict a future state of a system susceptible to the analytic approach without figuring out what states it will occupy between now and then, but in the case of many cellular automata, you must go through all the intermediate states to find out what the end will be like: there is no way to know the future except to watch it unfold…. There is no way to know the answer to some question any faster than what’s going on…. Fredkin believes that the universe is very literally a computer and that it is being used by someone, or something, to solve a problem. It sounds like a good-news/bad-news joke: the good news is that our lives have purpose; the bad news is that their purpose is to help some remote hacker estimate pi to nine jillion decimal places.

Robert Wright, commenting on Fredkin’s theory of digital physics

With at least a dozen inventors having some credible claim to having been “first” in the field of computation, we can say that the computer emerged not from a lone innovator’s basement, but rather from a rich period of intellectual ferment on several continents drawing upon a diversity of intellectual traditions and fueled by the exigencies of war. And if there were a dozen fathers of the computer, there were at least a couple dozen fathers of AI.

The notion of creating a new form of intelligence on earth emerged with an intense and often uncritical passion simultaneously with the electronic hardware on which it was to be based. The similarity of computer logic to at least some aspects of our thinking process was not lost on any of the designers of the early machines. Zuse, for example, applied his Z series of computers and Plankalkul language to the problem of chess. Turing’s contributions to the foundations of AI are extensive and well known. Turing’s 1950 classic paper “Computing Machinery and Intelligence” lays out an agenda that would in fact occupy the next quarter century of research: game playing, natural language understanding and translation, theorem proving, and of course, the cracking of codes.25 Even Nathaniel Rochester, the designer of IBM’s first successful computer, the 701, spent several years developing early AI technology; in fact, he was one of the principals in the 1956 Dartmouth Conference, which gave artificial intelligence its name.26

Perhaps the odd man out was John von Neumann, who found himself unable to imagine that the cumbersome and unreliable vacuum tubes used to build the first electronic computers could ever compete with the human brain.27 Even though the transistor had been invented when von Neumann expressed his skepticism, its applicability to computing had not yet been realized. More significant than the hardware limitations, however, was the hopelessness, according to von Neumann, of ever describing natural human actions and decision making in the precise language of mathematics. To some extent, von Neumann may have been reacting to the unrealistic expectations that had been set in the popular media of the time: magazine covers were predicting that superhuman electronic brains were just around the corner. Von Neumann did, however, show considerable interest in the idea of expressing human knowledge using the formalism of a computer language. The depth of his resistance to the idea that at least some behaviors we associate with natural intelligence might ultimately be automated is difficult to gauge; von Neumann died in 1957, shortly before he was to give a series of lectures at Yale on the likelihood of machine intelligence.28

Cybernetics: A new weltanschauung

The emergence of the computer, its early application to cognitive problems, the development of the theoretical foundations of computation, and related speculation on the implications of this new technology had a profound impact on fundamental tenets of what we might call the scientific worldview. In Cybernetics, Norbert Wiener’s seminal book on information theory, Wiener describes three ways in which the world’s (and his own) outlook had changed forever.29

Wiener, a prodigy who had studied with both Bertrand Russell and David Hilbert and received his Ph.D. from Harvard at the age of 18, had mastered an unusually broad range of intellectual fields from psychology and neurophysiology to mathematics and physics.30 His book was intended to establish as a new science the field of cybernetics, which he defines in his subtitle as control and communication in the animal and machine. In sections that are alternately addressed to the lay reader and to his fellow mathematicians, Wiener expounds his three premises.

First is the change from energy to information. Precybernetic reality consists of particles and the energy fields that control them. Accordingly, the old model of life was concerned primarily with the conversion of energy in its various biochemical and physical forms.31 If the living cell was a heat engine in the early twentieth century, it had now become a small computer. The new cybernetic model treats information as the fundamental reality in living things as well as in intelligent things, living or otherwise. In this new view, the most important transactions taking place in a living cell are the information-processing transactions inherent in the creation, copying, and manipulation of the amino acid strings we call proteins. Energy is required for the transmission and manipulation of information in both animal and machine, but this is regarded as incidental.

In recent times, with information-handling circuits becoming smaller and using ever smaller amounts of energy, energy has indeed become incidental to the process. The primary issue today in measuring information-handling systems is the amount and speed of their data processing capabilities. Edward Fredkin has recently shown that though energy is needed for information storage and retrieval, we can arbitrarily reduce the energy required to perform any particular example of information processing, and there is no lower limit to the amount of energy required.32 In theory, at least, the energy required to perform computation can become arbitrarily close to zero. Fredkin’s argument is significant in establishing that information processing is fundamentally different from energy processing. Even without Fredkin’s theoretical argument, the energy required for computation is not a significant issue today, although it was a bit of an issue in Wiener’s day.33 The lights in Philadelphia were reported to have dimmed when ENIAC, with its 18,000 vacuum tubes, was turned on, although this story is probably exaggerated.

The decoupling of information and energy is also important from an economic point of view. The value of many products today is becoming increasingly dominated by computation. As computation itself becomes less dependent on both raw materials and energy, we are moving from an economy based on material and energy resources to one based on information and knowledge.34

A second aspect of what Wiener saw as an epochal change in scientific outlook is characterized by the trend away from analog toward digital.35 Wiener argues that computation ultimately needed to be “numerical . . . rather than on the basis of measurement, as in the Bush Differential Analyzer.” When computing first emerged, there was a controversy between analog and digital computing, with the Differential Analyzer of Vannevar Bush (1890-1974), President Roosevelt’s science advisor, as a popular example of the former.36 In an analog computer, mathematical operations are performed by adding, subtracting, or even multiplying electrical quantities and then measuring the amount of voltage or current after the operations are performed. A very simple example is the common thermostat. Through the use of feedback loops, fairly complex formulas can be represented, although accuracy is limited by the resolution of the analog components.37 In a digital computer or process, numbers are represented not by amounts of voltage or current but rather by assembling numbers from multiple bits, where each bit is either on (generally representing 1) or off (generally representing 0). To many observers at the time, it was not clear which type of computing device would become dominant. Today we generally do not find the need to use the word “digital” before the word “computer,” since analog computers are no longer common, although in Wiener’s time, this was a necessary modifier. Wiener saw the limitations of analog computing in terms of both accuracy and the complexity of the algorithms that could be implemented. With a digital computer there is no theoretical limit on either accuracy or computational complexity.38

The trend from analog to digital, which was just getting started in Wiener’s day, continues to revolutionize a growing number of technologies and industries. The compact digital disk and digital audio tape are revolutionizing the recording of music. Digital technology is replacing the piano and other acoustic and analog musical instruments. A new generation of aircraft is being controlled by highly accurate digital control mechanisms replacing analog and mechanical methods. Phones and copiers are becoming digital. There are many other examples.39

However, the question of whether the ultimate nature of reality is analog or digital continues to be an important philosophical issue. As we delve deeper and deeper into both natural and artificial processes, we find the nature of the process often alternates between analog and digital representations of information. Consider, for example, three processes related to listening to a compact-disk recording: the reproduction of the sound from the compact disk recording, the musical understanding of the audio signal by the listener, and the nature of the data structures in music itself.40

First consider the reproduction process. The music is communicated to the listener by vibrations of the air, which are clearly an analog phenomenon. The electrical signals sent to the loudspeaker are also analog. The circuits interpreting the contents of the compact disk are, however, digital circuits that are computing the analog values to be played as sound. The digital values computed by the digital circuits are converted into analog electrical signals by a device called a digital-to analog converter. Thus, the sound exists in a digital representation prior to it being converted to analog electrical signals to be amplified, which is why we consider compact disk technology to be a digital technology.41

Let us now look at these digital circuits. As in many modern electronic products, these circuits are packaged into tiny integrated circuits containing thousands or tens of thousands of transistors each. It is interesting to note that while the circuits are digital, the transistors that comprise them are inherently analog devices. The designers of the transistors themselves do not consider the transistor to be a digital device but rather are very aware of its inherently analog nature. The transistor is analog because it acts as a small electrical amplifier and deals with variable (continuous-valued) amounts of current.42 They are “tricked” into acting digital (with values of on or off) by thresholding (comparing values to a constant) their otherwise analog (continuous-valued) characteristics. But a transistor cannot model a digital device perfectly; it will compute the wrong digital value some small fraction of the time. This is not a problem in practice. Using the very information theory created by Wiener and an associate of his, Claude Elwood Shannon, digital circuit designers can make the likelihood that the analog transistor will malfunction in its digital role so low that they can essentially ignore this possibility and thus consider the transistor to be a reliable digital element.43

On a deeper level we can understand the continuous levels of current within one transistor to be ultimately discrete: they are made up of a very large number of individual electrons with individual quantum states. So now we are back to a digital view of things. But when we consider the Heisenberg uncertainty principle, which tells us that the electrons have no precise locations but only probability clouds of possible locations, we are back to a fuzzy (analog) model of the world.

Analog and digital representation of information.

Listening to a compact disk recording: digital and analog representions of information.

As we approach finer and finer models of the world, physics flips several times between the digital and analog conceptions of reality. Consider now the path of the music sounds going toward the listener. The analog sound waves vibrate the ear drum and ultimately enter the cochlea, a natural electronic circuit that acts as a spectrum (pitch) analyzer. The cochlea breaks up the sound waves into distinct frequency bands and emits a digitally encoded representation of the time-varying spectrum of the sound. This digital representation enters the acoustic cortex of the brain, where processing takes place by techniques that are both analog and digital.44

Finally, consider the nature of music itself with its elements of melody, rhythm, harmony, expression, and timbre. The first three elements, which are the elements represented by musical notation, are clearly digital in nature. The availability of music notation processors, which are to music notation what word processors are to written language, are a clear testament to the inherently digital nature of these three musical elements. Melodic, rhythmic, and harmonic structures are modeled in music theory by the mathematics of digital logic rather than the mathematics of analog calculus. Expression and timbre, on the other hand, though they certainly can be represented using digital means, are nonetheless analog in nature.45

It is clear that we use both digital and analog approaches in understanding the world around us. As we view phenomena at different levels of specificity and detail, we find their nature changes repeatedly. The ultimate nature of reality is still being debated. Edward Fredkin has recently proposed what he calls a new theory of physics stating that the ultimate reality of the world is software (that is, information). According to Fredkin, we should not think of ultimate reality as particles and forces but rather as bits of data modified according to computational rules. If, in fact, particles are quantized in terms of their locations and other properties, then the views of the world as made up of particles and as made up of data are essentially equivalent. Though Fredkin’s view has startled contemporary theoreticians, it is just another way of postulating a fully quantized or digital world. The alternative analog view would state that the position or at least one of the properties of a particle is not quantized but is either continuous or uncertain.

Regardless of the ultimate nature of the world, Wiener’s prediction of a digital conception of nature was both profound and prophetic. From a technological and economic point of view, the digital approach continues to replace the analog, transforming whole industries in the process. The digital outlook has also permeated our philosophical and scientific views of the world.

An additional comment on the issue of digital versus analog concerns another contemporary controversy. Some observers have criticized digital computing as a means of replicating human cognition because of its all or nothing nature, which refers to the fact that digital bits are either on or off with no states in between. This, according to these observers, contrasts with the analog processes in the human brain, which can deal with uncertain information and are able to balance decisions based on “soft” inputs. Thus, they say, analog methods that can deal with gradations on a continuous scale will be needed to successfully emulate the brain.

My reaction to this is that the argument is based on a misconception, but I nonetheless agree with the conclusion that we will see a return to analog computing, alongside digital techniques, particularly for pattern-recognition tasks. The argument that digital techniques are all or nothing is clearly misleading. By building numbers (including fractional parts) from multiple bits, digital techniques can also represent gradations, and with a greater degree of continuity and precision than analog techniques-indeed, with any degree of precision desired. New knowledge engineering techniques that use a method called fuzzy logic can apply digital computing to decision making in a way that utilizes imprecise and relative knowledge in a methodologically sound manner. Often the criticism of digital computing is really aimed at the type of “hard” antecedent-consequent type of logic employed in the first generation of expert systems. Overcoming the limitations of this type of logic does not require resorting to analog techniques.

There is, however, a good reason that we are likely to see a return to hybrid analog-digital computing designs. Analog computing is substantially less expensive in terms of the number of components required for certain arithmetic operations. Adding or even multiplying two quantities in an analog computer requires only a few transistors, whereas a digital approach can require hundreds of components. With integrated-circuit technology providing hundreds of thousands or even millions of components on a single chip, this difference is usually not significant and the greater precision of digital computing generally wins out. But consider the trend toward massive parallel processing, in which a system performs many computations simultaneously rather than just one at a time, as is the case in the classical von Neumann computer architecture. For certain types of problems, such as visual image analysis, it would be desirable to be able to perform millions of computations at the same time (computing the same transformations on every pixel (point) of a high resolution image, for example). Often these calculations do not require a high degree of accuracy, so performing them with analog techniques would be very cost effective. Doing a million computations simultaneously with analog methods would be practical with today’s semiconductor technology but still prohibitively expensive with digital techniques. Evolution apparently found the same engineering trade-off when it designed the human brain.

The third major theme of Wiener’s treatise concerns the nature of time. He argues that we have gone from a reversible or Newtonian concept of time to an irreversible or Bergsonian notion of time. Wiener regards Newtonian time as reversible because if we run a Newtonian world backward in time, it will continue to follow Newton’s laws. The directionality of time has no significance in Newtonian physics. Wiener used Bergson, a biologist who analyzed the cause and effect relationships in biology and evolution, as a symbol for the irreversibility of time in any world in which information processing is important.

Computing is generally not time reversible, and the reason for this is somewhat surprising. There are two types of computing transformations, one in which information is preserved and one in which information is destroyed. The former type is reversible. If information is preserved, we can reverse the transformation and restore the information to the format it had prior to the transformation. If information is destroyed, however, then that process is not reversible, since the information needed to restore a state no longer exists. Consider, for example, a program that writes zeros throughout memory and ultimately destroys (most of)

Two types of computing transformations: the preserver and the destroyer.

itself. We cannot conceive of a way to reverse this process, since the original contents of memory are no longer anywhere to be retrieved. This is surprising because we ordinarily think of computation as a process that creates new information from old. After all, we run a computer process to obtain answers to problems, that is, to create new information, not to destroy old information. The irreversibility of computation is often cited as a reason that computation is useful: it transforms information in a unidirectional “purposeful” manner. Yet the derivation of the proof that computation is irreversible is based on the ability of computation to destroy information, not to create it. Another view, however, is that the value of computation is precisely its ability to destroy information selectively. For example, in a pattern recognition task such as recognizing images or speech sounds, preserving the invariant information-bearing features of a pattern while “destroying” the enormous amount of data in the original image or sounds is essential to the process. In fact, intelligence is often a process of selecting relevant information carefully from a much larger amount of unprocessed data. This is essentially a process of skillful and purposeful destruction of information.

Wiener points out that a model of time that is irreversible is essential to the concepts of computation, communication, and intelligence. One entertaining example cited by Wiener is the impossibility of two intelligent entities attempting to communicate when they are “traveling” in different directions in time. Neither one will perceive the other as either responsive or intelligent. Energy to information, analog to digital, and reversible time to irreversible time are all facets of the same revolution in worldview known as computation. It was a revolution both for the world at large and for Wiener himself. It is interesting to note that virtually all of the mathematics in Wiener’s book is not that of the logician and computationalist but rather the calculus of the Newtonian physicist, which comprises Wiener’s original scientific tradition.

There are other significant insights in Wiener’s classic book. He gives five principles that should, in his opinion, govern the design of computing machinery; a computer should

  • be digital rather than analog,
  • be electronic rather than electromechanical,
  • use binary rather than decimal notation,
  • be programmable, contain a dynamic (erasable) random access memory.

Wiener says in his Introduction that he had written a letter to Vannevar Bush in 1940 suggesting these principles and commenting on their importance to military requirements. Bush, being firmly in the analog camp, apparently did not respond.

Wiener proposed using computing machinery to provide effective prostheses for the sensory impaired. By converting visual information into an appropriate auditory modality for the blind, or alternatively converting auditory information into a visual display for the deaf, electronic technology could help overcome the primary handicap associated with these disabilities. It would be several decades before computers would begin to master these tasks. Wiener ends the 1948 version of Cybernetics with a description of a recursive strategy for building a computer chess machine. He predicts that such a machine would be capable of playing at a level that would be “not so manifestly bad as to be ridiculous.”

A theme that is echoed throughout this book, as well as in other writings of Wiener’s, is the need for collaboration among multiple disciplines if the difficult challenge of a machine’s mastering cognitive problems is to be addressed. He articulates a need to reverse the trend toward increasingly narrow specialization in the sciences and predicts that solving the major technology problems of the second half of the twentieth century would require knowledge and expertise to be drawn and integrated from diverse disciplines. The greatest difficulty in achieving this type of cooperation, Wiener feels, is the fact that experts from different fields all use different terminologies often to describe the same phenomena. Though Wiener’s prediction was largely overlooked for decades, AI researchers just now seem to be coming to the realization that further progress in creating machine intelligence requires exactly this sort of interdisciplinary collaboration.

The movement takes shape

The 1940s saw the invention of the electronic programmable digital computer, major developments in computational theory, and the emergence of the idea that with the right software these new machines could simulate the human brain.46 After all, in some ways the new machines were substantially superior to human cognition, solving in minutes what mere human mathematicians labored on for months. With several important treatises by Turing (including his 1950 paper “Computing Machinery and Intelligence,” in which Turing proposes his imitation game, which became known as the Turing test), Wiener’s Cybernetics, a 1943 paper on neural nets by Warren McCulloch and Warren Pitts, a proposal by Shannon in 1950 for a chess program, and several other influential papers, the AI agenda had been set.47

In the 1950s concrete progress began to be made. Initial progress came so rapidly that some of the early pioneers felt that mastering the functionality of the human brain might not be so difficult after all. After developing IPL-II (Information Processing Language II), the first AI language, in 1955, Allen Newell, J. C. Shaw, and Herbert Simon created a program in 1956 called the Logic Theorist (LT), which used recursive search techniques to solve problems in mathematics.48 It was able to find proofs for many of the theorems in Whitehead and Russell’s Principia Mathematica, including at least one completely original proof for an important theorem that had never been previously published.49 In 1957 Newell, Shaw, and Simon created a more sophisticated problem solver called the General Problem Solver (GPS).50 As described earlier, it used means-ends analysis, a variation of LT’s recursive technique, to solve problems in a variety of domains. These early successes led Simon and Newell to say in a 1958 paper entitled “Heuristic Problem Solving: The Next Advance in Operations Research,” “There are now in the world machines that think, that learn and that create. Moreover, their ability to do these things is going to increase rapidly until-in a visible future-the range of problems they can handle will be coextensive with the range to which the human mind has been applied.”51 The paper goes on to predict that within ten years (that is, by 1968) a digital computer will be the world chess champion. Such unbridled enthusiasm was to earn the field considerable criticism and critics.52 While the chess prediction turned out to be premature, a program completed in 1959 by Arthur Samuel was able to defeat some of the best players of the time in the somewhat simpler game of checkers.53

In 1956 the first conference on AI was held. Organized by John McCarthy, it included a number of the field’s future academic leaders, including Marvin Minsky and Arthur Samuel, as well as Newell and Simon. Also participating were several computer pioneers, including Oliver Selfridge, Claude Shannon, and Nathaniel Rochester.54 The conference gained its notoriety from its one identifiable accomplishment, which was to give the field its name, artificial intelligence. McCarthy, who is credited with the name, is not sure if he really made it up or just overheard it in a conversation, but he does take credit for having promoted it. Some participants argued against the name: Shannon felt it was too unscientific; Samuel criticized the word “artificial” with the comment “it sounds like there is nothing real about this work.” Perhaps just because the phrase is so startling and unsettling, it has outlasted many others, including Wiener’s “cybernetics.”55



Photo by Lou Jones www.fotojones.com

Patrick Winston, director of MIT’s Artificial Intelligence Lab. His research involves work on learning by analogy and common sense problem solving.

Other concrete accomplishments of the conference are hard to discern, although Minsky did write a first draft of his influential “Steps toward Artificial Intelligence,” which was rewritten many times (as is Minsky’s style) and finally published in 1963.56 McCarthy refined his ideas for a language that would combine list (treelike) structures with recursion, which was subsequently introduced in 1959 as LISP (List-processing language).57 It quickly became the standard for AI work and has remained the principal AI language through several major revisions. Only recently with AI having entered the commercial arena have other languages such as C begun to compete with LISP, primarily on the basis of efficiency. Probably the major contribution of the conference was to put a number of the leading thinkers in the field in touch with one another. Progress, ideas, and a great deal of enthusiasm were shared, although McCarthy left the conference disappointed that most of its specific goals had not been met. He went on to found the two leading university AI laboratories: one at MIT with Marvin Minsky in 1958 and one at Stanford in 1963.

By the end of the 1960s the full AI agenda was represented by specific projects. A number of the more significant efforts of the decade are described in Semantic Information Processing, edited by Marvin Minsky and published in 1968. Included was Daniel G. Bobrow’s Ph.D. project entitled Student, which could set up and solve algebra problems from natural English language stories.58 Student reportedly rivaled the ability of typical high school students in solving story problems. The same performance level was claimed for a program created by Thomas G. Evans that could solve IQ test geometric-analogy problems.59

Computer chess programs continued to improve in the 1960s, although not nearly up to Simon’s expectations. A program that could achieve respectable tournament ratings was created in 1966 by Richard D. Greenblatt at MIT.60

A new area for AI, expert systems, also got its start in the 1960s. With leadership from Edward A. Feigenbaum, now the director of the Stanford AI Laboratory, a group started work on DENDRAL, a program based on a knowledge base describing chemical compounds. It is considered the world’s first expert system.61

The 1960s also saw the creation of ELIZA, a natural language program written by MIT professor Joseph Weizenbaum in 1966, which simulates a nondirective (i.e., Rogerian) therapist. ELIZA has continued for over two decades to receive a high level of attention, including extensive criticism for its inability to react intelligently in a variety of situations, from some AI critics, most notably Hubert Dreyfus.62 Actually, ELIZA was never representative of the state of the art in natural-language understanding, even at the time it was created. It was a demonstration of how successful one could be in creating an apparently intelligent interactive system with relatively simple rules. ELIZA is a good example of the principle of achieving complex behavior from a simple system in a complex environment. People using ELIZA provide it with very diverse inputs, which often lead to unexpected and seemingly insightful responses despite its relative simplicity. A more substantial effort in the area of natural language was Susumu Kuno’s pioneering work on English language parsers: his 1963 program could understand the syntax of complex English sentences.

Another attribute characteristic of AI work in the 1960s was the use of toy worlds. A prime example was Terry Winograd’s Ph.D. thesis at MIT called SHRDLU, which combined natural language understanding with a planning capability in a simulated world (displayed on a terminal screen) that consisted only of different colored and shaped blocks.63 Users could ask SHRDLU to perform tasks with questions and commands phrased in natural language, such as “Take the red block in front of the big cube and place it on the blue rectangular solid.” The system would understand the English statements and could plan a strategy for performing the task. Although several critics of AI jumped on the unrealistic nature of these toy worlds, it was appropriate at the time to concentrate on problems of language understanding and decision making without the vagaries of real-world complexity.64 The next step, going from toy worlds to real worlds, continues to occupy the attention of AI researchers.

Daniel Bobrow’s Student solves algebra story problems. Note the stages of analysis from the original statement to the equations to be solved. (From Daniel G. Bobrow, “Natural Language Input for a Computer Problem-Solving System,” in Minsky, ed., Semantic Information Processing, 1968, pp.212-213)

(THE PROBLEM TO BE SOLVED IS)

(IF THE NUMBER OF CUSTOMERS TOM GETS IS TWICE THE SQUARE OF 20 PERCENT OF THE NUMBER OF ADVERTISEMENTS HE RUNS, AND THE NUMBER OF ADVERTISEMENTS HE RUNS IS 45, WHAT IS THE NUMBER OF CUSTOMERS TOM GETS Q.)

(WITH MANDATORY SUBSTITUTIONS THE PROBLEM IS)

(IF THE NUMBER (OF CUSTOMERS TOM GETS IS 2 TIMES THE SQUARE 20 PERCENT OF THE NUMBER OF ADVERTISEMENTS HE RUNS, AND THE NUMBER OF ADVERTISEMENTS HE RUNS IS 45, WHAT IS THE NUMBER OF CUSTOMERS TOM GETS Q.)

(WITH WORDS TAGGED BY FUNCTION THE PROBLEM IS)

(IF THE NUMBER (OF / OP) CUSTOMERS TOM (GETS / VERB) IS 2 (TIMES / OP 1) THE

(SQUARE / OP 1) 20 (PERCENT / OP 2) (OF / OP) THE NUMBER (OF / OP)

ADVERTISEMENTS (HE / PRO) RUNS, AND THE NUMBER (OF / OP) ADVERTISEMENTS (HE / PRO) RUNS IS 45, (WHAT / QWORD) IS THE NUMBER (OF / OP) CUSTOMERS TOM (GETS / VERB) (QMARK / DLM))

(THE SIMPLE SENTENCES ARE)

(THE NUMBER (OF / OP) CUSTOMERS TOM (GETS / VERB) IS 2 (TIMES / OP 1) THE (SQUARE / OP 1) 20 (PERCENT / OP 2) (OF / OP) THE NUMBER (OF / OP) ADVERTISEMENTS (HE / PRO) RUNS (PERIOD / DLM)

(THE NUMBER (OF / OP) ADVERTISEMENTS (HE / PRO) RUNS IS 45 (PERIOD / DLM)) ((WHAT / QWORD) IS THE NUMBER (OF / OP) CUSTOMERS TOM (GETS / VERB) (QMARK / DLM))

(THE EQUATIONS TO BE SOLVED ARE)

(EQUAL G02515 (NUMBER OF CUSTOMERS TOM (GETS / VERB))) (EQUAL (NUMBER OF ADVERTISEMENTS (HE / PRO) RUNS(45) (EQUAL (NUMBER OF CUSTOMERS TOM (GETS / VERB)) (TIMES 2 (EXPT (TIMES .2000 (NUMBER OF ADVERTISEMENTS (HE / PRO) RUNS)) 2))) (THE NUMBER OF CUSTOMERS TOM GETS IS 162)

(THE PROBLEM TO BE SOLVED IS)

(THE SUM OF LOIS’S SHARE OF SOME MONEY AND BOB’S SHARE IS $4.500. LOIS’S SHARE IS TWICE BOB’S. FIND BOB’S AND LOIS’S SHARE.)

(WITH MANDATORY SUBSTITUTIONS THE PROBLEM IS)

(SUM LOIS’S SHARE OF SOME MONEY AND BOB’S SHARE IS 4.500 DOLLARS. LOIS’S SHARE IS 2 TIMES BOB’S. FIND BOB’S AND LOIS’S SHARE)

(WITH WORDS TAGGED BY FUNCTION THE PROBLEM IS)

((SUM / OP) LOIS’S SHARE (OF / OP) SOME MONEY AND BOB’S SHARE IS 4.500 DOLLARS (PERIOD / DLM) LOIS SHARE IS 2 (TIMES / OP 1) BOB’S (PERIOD / DLM) (FIND / QWORD) BOB’S AND LOIS’S SHARE (PERIOD / DLM))

(THE SIMPLE SENTENCES ARE)

((SUM / OP) LOIS’S SHARE (OF / OP) SOME MONEY AND BOB’S SHARE IS 4.500 DOLLARS (PERIOD/ DLM)) (LOIS’S SHARE IS 2 (TIMES / OP 1) BOB’S (PERIOD / DLM)) ((FIND / QWORD) BOB’S AND LOIS’S SHARE (PERIOD / DLM))

(THE EQUATIONS TO BE SOLVED ARE)

(EQUAL G02519 (LOIS’S SHARE))

(EQUAL G02518 (BOB’S))

(EQUAL (LOIS’S SHARE) (TIMES 2 (BOB’S)))

(EQUAL PLUS (LOIS’S SHARE OF SOME MONEY) (BOB’S SHARE)) (TIMES 4.500 (DOLLARS)))

THE EQUATIONS WERE INSUFFICIENT TO FIND A SOLUTION

(ASSUMING THAT)

((BOB’S) IS EQUAL TO (BOB’S SHARE))

(ASSUMING THAT)

((LOIS’S SHARE) IS EQUAL TO (LOIS’S SHARE OF SOME MONEY))

(BOB’S IS 1.500 DOLLARS)

(LOIS’S SHARE IS 3 DOLLARS)

(THE PROBLEM TO BE SOLVED IS)

(MARY IS TWICE AS OLD AS ANN WAS WHEN MARY WAS AS OLD AS ANN IS NOW. IF MARY IS 24 YEARS OLD, HOW OLD IS ANN Q.)

(WITH MANDATORY SUBSTITUTIONS THE PROBLEM IS)

(MARY IS 2 TIMES AS OLD AS ANN WAS WHEN MARY WAS AS OLD AS ANN IS NOW. IF MARY IS 24 YEARS OLD, WHAT IS ANN Q.)

(WITH WORDS TAGGED BY FUNCTION THE PROBLEM IS)

((MARY / PERSON) IS 2 (TIMES / OP 1) AS OLD AS (ANN / PERSON) WAS WHEN (MARY / PERSON) WAS AS OLD AS (ANN / PERSON) IS NOW (PERIOD / DLM) IF (MARY / PERSON) IS 24 YEARS OLD, (WHAT / QWORD) IS (ANN / PERSON) (QMARK / DLM))

(THE SIMPLE SENTENCES ARE)

((MARY / PERSON)’S AGE IS 2 TIMES / OP 1) (ANN / PERSON)’S AGE G02521 YEARS AGO (PERIOD / DLM))

(G02521 YEARS AGO (MARY/ PERSON’S AGE IS (ANN / PERSON)’S AGE NOW (PERIOD / DLM))

((MARY / PERSON)’S AGE IS 24 (PERIOD / DLM))

((WHAT / QWORD) IS (ANN / PERSON)’S AGE (QMARK / DLM))

(THE EQUATIONS TO BE SOLVED ARE)

(EQUAL G02522 ((ANN / PERSON)’S AGE)) (EQUAL ((MARY / PERSON)’S AGE) 24) (EQUAL (PLUS ((MARY / PERSON)’S AGE) (MINUS (G02521))) ((ANN / PERSON)’S AGE)) (EQUAL ((MARY / PERSON)’S AGE) (TIMES 2 PLUS ((ANN / PERSON)’S AGE) (MINUS G02521)))))

(ANN’S AGE IS 18)

Photo by Lou Jones www.fotojones.com

Joseph Weizenbaum of MIT, AI pioneer and critic.

A conversation with ELIZA. (ELIZA’s responses are in all caps.) (Courtesy of Avron Barr and Edward Feigenbaum)

Men are all alike.

IN WHAT WAYS

They’re always bugging us about something or other.

CAN YOU THINK OF A SPECIFIC EXAMPLE

Well, my boyfriend made me come here.

YOUR BOYFRIEND MADE YOU COME HERE

He says I’m depressed much of the time.

I AM SORRY TO HEAR THAT YOU ARE DEPRESSED

It’s true. I’m unhappy.

DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY

I need some help, that much seems certain.

WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP

Perhaps I could learn to get along with my mother.

TELL ME MORE ABOUT YOUR FAMILY

WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR FATHER

Bullies

DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE

The age of recursion

Through the end of the 1960s two schools of thought competed for the attention of AI researchers and funding. The neural school, based on self-organizing neural nets, started in the 1940s with a paper by McCulloch and Pitts and was heavily promoted in the 1960s by Frank Rosenblatt of Cornell University.65 The lure of neural nets is the idea that a system of many simple elements, if set up in just the right way, could actually teach (organize) itself to perform intelligent behavior. With the publication of Minsky and Papert’s Perceptrons in 1969, this school of thought was eliminated almost overnight. As mentioned earlier, the book proved several theorems that demonstrated that certain types of simple problems could never be solved with this approach. Now in the late 1980s there is renewed interest in a new type of neural net to which the Minsky and Papert theorems do not apply.66

The school that survived, which one might call the recursive school, was based on the idea that a relatively small set of powerful ideas would be sufficient to capture at least some forms of intelligence. If one let these methods attack complicated problems, one could obtain intelligent behavior. The ideas included methods of exploring (searching) the “space” of possible solutions to a problem as well as techniques for defining the rules that govern certain domains (e.g. the syntax rules of natural language).

Foremost among these “powerful” ideas is recursion, the idea that the statement of a problem can be used as a part, often the most important part, of the problem’s solution. It is a seductive concept; it implies that the essence of solving a problem is often a matter of being able carefully and precisely to state the problem. This approach works surprisingly well in a variety of domains, and many AI methodologies make extensive use of it. It is one reason that the von Neumann computer architecture that was first incorporated in the famous EDSAC and BINAC computers was so important. Without the von Neumann capability for a stored program, recursion is not possible.

We examined one illustration of the power of recursion earlier in the context of game playing. To select our best move in a rule-based game such as chess we call a program called Move. Move generates all of the legal moves for the current board position and then calls itself to determine the opponent’s best response, to each possible move (which in turn calls itself again to determine our best response to our opponent’s best response, and so on). In this way as many move and countermove sequences as we have time to compute are automatically generated. The most complicated part of implementing this technique is the generation of the possible moves. Generating possible moves is a matter of programming the rules of the game. Thus, the heart of the solution is indeed implementing a precise statement of the problem.

A primary distinguishing feature of McCarthy’s LISP, historically the AI field’s primary programming language, is its ability to easily represent recursive procedures. Another example should illustrate how easily a recursive language can be used to solve seemingly complex problems. Consider the problem called the Tower of Hanoi. This famous children’s puzzle presents issues similar to many important combinatorial problems found in mathematics and other fields.

We have three towers on which we can place round disks of various sizes. We can move disks from tower to tower, but we can only move one disk at a time and we cannot place a disk onto a disk of smaller size. Thus, a legal stack of disks will have the largest disk on the bottom and the smallest disk on top. The problem is to move a stack of disks from one tower to another. Consider first a stack of just one disk. Here the answer is obvious: simply move the disk. It is also fairly easy to determine the procedure for two disks. Let’s consider a larger stack. Try actually solving the problem for a stack of seven or eight disks-it is rather challenging.

Now consider the general problem of describing a method that can quickly determine the optimal procedure for a stack of any arbitrary height. This problem at first appears very difficult, but a simple insight enables us to use the self-referencing (recursive) paradigm to automatically generate the optimal procedure for any height stack in three simple steps. Let us number the disks from 1 to n, n being the largest (bottom) disk. The insight is this: if we are to move the entire stack from the original tower to the destination tower, at some point we are going to have to move the bottom disk. That’s it! We have just solved the problem. The power of recursion is that this simple observation is enough to solve the entire problem.

To wit, if the stack consists of only a single disk, then move that disk from the original tower to the destination tower, and we’re done. Otherwise,

  • Step 1 Since we know that we will need at some point to move the bottom disk, we clearly need to move the disks on top of it out of the way. We therefore have to move them away from both the original tower and the destination tower. Thus, we have to move the stack consisting of all the disks except for the bottom disk (we can call this the (n – 1) stack since it consists of disk 1 through disk n – 1) from the original tower to the free tower. This is where the recursion comes in. Moving the stack of (n – 1) disks is the same Tower of Hanoi problem that we started with, only for a smaller stack. The program solving the Tower of Hanoi problem simply calls itself at this point to solve the problem of moving the (n – 1) stack to the free tower. This does not lead to an infinite loop because of our special rule for a stack of only one disk.
  • Step 2 Now with all of the other disks out of the way, we can move the bottom disk from the original tower to the destination tower.
  • Step 3 Finally, we move the stack of (n – 1) disks from the free tower to the destination tower. This again requires a recursive call. Now we’re done.

The power of recursive programming languages is that a program can call itself with the language keeping track of all the tightly nested self-referencing. The above three steps will not only solve the problem, it will produce the optimal solution to the problem for any height stack. It generates exactly (2n- 1) moves for n disks. If one attempts to solve the problem directly by moving disks, one quickly gets lost in the apparent complexity of the puzzle. To solve the problem using recursion, all that is required is enough insight to be able to state how the solution for n disks can be built from the solution for (n – 1) disks. Recursion then optimally expands the solution for any value of n.

As the above example should make clear, recursion is a tool that can unlock solutions to problems that otherwise would be enormously difficult to understand. A variety of techniques based on this self-referencing paradigm found widespread applicability by the 1960s. The initial success of recursive AI techniques-finding, for example, an original proof for an important theorem in Principia Mathematica-fueled much of the field’s early optimism.

A recursive program for the Tower of Hanoi problem

The following program, written in the C programming language, generates an optimal solution to the Tower of Hanoi problem. Try following the logic of this program from the comments even if you are not familiar with programming in general or the C language in particular. The program is written as a function that when called will print out the solution. It is called with parameters specifying the original tower, the destination tower, the free tower, and the number of disks. Note that in this function, anything that appears between /* and */ is considered to be a comment and is not part of the program.

Tower_ of_hanoi (original, destination, free, number-of-disks)

/* This function will print out an optimal solution to the Tower of Hanoi problem. */

integer original; /* This parameter specifies the originating tower. */

integer destination; /* This parameter specifies the destination tower. */

integer free; /* This parameter specifies the free tower. */

integer number-of-disks; /* This parameter specifies the number of disks. The disks are numbered from 1 to n with n being the largest (bottom) disk. */

{

if (number_of_disks == 1)

{

print (“Move disk 1 from tower %d to tower %d
“, original destination);

return;

}

/* If the number of disks is 1, then this is the escape from recursion. We simply print that disk 1 is to be moved from the originating tower to the destination tower and then return. */

tower_of_hanoi (original, free, destination, number_of_disks – 1);

/* Here we have the first recursive call where the tower_of_hanoi function calls itself. This call moves the (n – 1) stack (the stack consisting of all the disks except for the bottom one) from the originating tower to the free tower. */

print (“Move disk %d from tower %d to tower %d
“, number_of_disks, original, destination);

/* Now print that the bottom disk (disk n) is to be moved from the originating tower to the destination tower. */

tower_of_hanoi (free, destination, original, number of disks – 1);

/* Move the (n – 1) stack from the free stack to the destination stack. */ return; /* We’re done! */

} /* End of tower_of_hanoi function */

The age of knowledge

Somewhere around the early 1970s a major conceptual change took place in AI methodology (it is difficult to pick a hard date as there is no single paper one can cite that ushered in the change).67 While work in the 1950s and 1960s concentrated primarily on the mechanics of the reasoning process (search, recursion, problem representation, etc.), it became apparent by the early 1970s that such techniques alone were not sufficiently powerful to emulate the human decision-making process, even within narrowly defined areas of expertise.68 Something was missing, and the something turned out to be knowledge.

Some work on knowledge representation (knowledge about how to represent knowledge) took place earlier.69 In fact, two projects conducted during the 1960s and described in Minsky’s Semantic Information Processing were Bertram Raphael’s SIR program (Semantic Information Retrieval) and a theory of semantic memory by M. Ross Quillian, both of which dealt with methods for representing human knowledge. What changed in the early 1970s was the recognition of the relative importance of knowledge versus method. In the 1950s and 1960s there was an emphasis on the power of techniques, particularly recursive ones, for emulating the logical processes associated with thinking. By the 1970s it was recognized that programming reasoning techniques were relatively simple compared to the task of creating a knowledge base with the depth required to solve real-world problems.

Roger Schank points out the extensive knowledge required to understand even simple stories. If we read a story about a restaurant, there is a vast body of factual information about restaurants that we take for granted.70 Understanding the statement “He paid the bill” requires understanding that in a restaurant we are expected to pay for the food we order, we order food on credit, we are brought a document (generally at the end of the meal) called the “tab,” “check,” or “bill,” which itemizes the charges, we generally settle this debt prior to leaving the restaurant, and so on. When we read a story, almost every sentence evokes vast networks of similarly implied knowledge. The difficulty in mastering all of this commonsense knowledge is that no one has ever bothered to write it all down, and the quantity of it is vast.

Mastering knowledge has indeed turned out to be a far more difficult process than mastering the logical processes inherent in deductive or even inductive reasoning. First, we need to have a means of structuring knowledge to make it useful. A simple list of all facts in the world, if such a list could be constructed, would not help us solve problems, because we would have a hard time finding the right facts to fit the right situations. Douglas Hofstadter provides an amusing example of the problem in Metamagical Themas. “How do I know,” he asks, “when telling you I’ll meet you at 7 at the train station, that it makes no sense to tack on the proviso, ‘as long as no volcano erupts along the way, burying me and my car on the way to the station,’ but that it does make reasonable sense to tack on the proviso, ‘as long as no traffic jam holds me up?”‘ The objective of appropriate knowledge structures is to quickly access the information truly relevant to any particular situation. Once we have constructed suitable structures for knowledge representation, we then need to actually collect the vast amount of information required to solve practical problems. Finally, we have to integrate this knowledge base with the appropriate decision-making algorithms.

The early 1970s saw a number of pioneering efforts to address the knowledge-representation problem. Perhaps the most famous was Minsky’s theory of frames, which he described in his 1975 paper “A Framework for Representing Knowledge.”71 A frame is a data structure that can include information of arbitrary complexity about an object or type of object, and that allows for multiple hierarchies for understanding relationships between classes of objects. For example, we can have a frame of information about the concept of a dog, another for the concept of a cat, and another for the concept of a mammal. The mammal frame is a higher-level frame than those representing examples of mammals, and the relationship between the levels (e.g., a dog is a type of mammal) is built into the frame structures. Each frame allows for default information to be filled in or “inherited” from a higher-level frame. For example, if the mammal frame says that mammals have four legs, this information would not have to be repeated in the frames for dogs and cats. However, a human frame would have to indicate an exception to this default information (one specifying only two legs). The frame methodology avoids redundancy, describes hierarchical relationships, and allows for arbitrarily complex classifications. It also helps us to make useful hypotheses. If we learn, for example, about another mammal, we can assume that it has four legs until informed otherwise.

Another important approach to representing knowledge and the interdependency relationships between concepts was first described in another 1975 paper, this one describing a project called SAM (Script Applier Mechanism) by Roger Schank, Robert Abelson, et al. at Yale University. Schank’s methodology allowed for the development of “scripts” that provide the information implicit in everyday situations such as restaurants.72

The second part of the knowledge issue, actually collecting the knowledge, has proved to be the greatest challenge. In developing modern expert systems (computer-based systems that emulate the decision-making ability of human experts), the process of collecting the necessary knowledge is generally a painstaking process involving a “knowledge engineer” interviewing the appropriate human experts and literally writing down (in an appropriate computer language) all of the relevant knowledge and decision-making rules used by that human expert. The sheer volume of information involved is one problem, but a bigger one is that while human experts are capable of solving problems within their domains of expertise, they generally do not know how they accomplish these tasks. The skill required of the knowledge engineer is to be able to extract the decision-making process from the domain experts despite their not being consciously aware of many elements of this process.73

With a first generation methodology for building expert systems already established, a number of ambitious projects were started in the 1970s. Internist (now called CADUCEUS), an expert system that diagnoses a wide range of internal diseases, was developed throughout the 1970s (and continued in the 1980s). In one study, Internist was able to diagnose illnesses within at least one specialty with an accuracy equal to or better than human physicians. MYCIN, a system that can make diagnoses and recommend treatments for a wide range of bacterial infections, was developed by Edward H. Shortliffe in the mid 1970s. Prospector, an expert system that is capable of pinpointing energy and geology deposits, was initiated by R.O. Duda and his associates at Stanford Research Institute in 1978. In at least one case Prospector identified a number of important energy deposits overlooked by human experts. Finally, XCON, probably the most successful expert system in commercial use today, started operation in 1980 configuring complex computer systems for Digital Equipment Corporation. This system, running on a single VAX computer, is able to perform tasks that would otherwise require several hundred human experts, and at substantially higher rates of accuracy. These systems and others, as well as the problems of knowledge representation, will be discussed in greater detail in chapter 8.74

The U.S. Department of Defense through its agency DARPA (Defense Advanced Research Projects Agency), funded two major initiatives in the pattern recognition area during the 1970s. The SUR (Speech Understanding Research) project funded the development of several experimental continuous-speech-understanding programs aimed at machine recognition of normal human speech with a large vocabulary. Though the most successful system from the SUR project did not operate in real time and was limited to an artificial syntax, SUR did increase confidence that practical, high-performance speech recognition was feasible. A similar program called IUP (Image Understanding Program) attempted machine comprehension of visual images.

The intensity of effort as well as the practical value of AI technology grew enormously during the 1980s. Here I shall mention briefly two salient trends: the commercialization and the internationalization of AI. The AI industry grew from just a few million dollars at the beginning of the 1980s to $2 billion by 1988, according to DM Data, a leading market-research firm. Many market analysts predict that the bulk of the several-hundred-billion dollar computer and information processing market by 1999 will be intelligent, at least by today’s standards.

The 1980s began with a stunning challenge from Japan’s powerful MITI (the Ministry of International Trade and Industry) when they announced a plan to design and build an intelligent fifth generation computer. This was seen by many as an attempt by Japan to leapfrog its foreign competitors and establish dominance over the international computer industry.75

As I said earlier, the idea that human intelligence could be simulated seems to have occurred to all of the pioneers who played a role in what I consider to be the twentieth century’s greatest invention, the computer. Though artificial intelligence was not named until 1956, the concept was by no means an afterthought. Despite the fact that the early computers were used primarily for numerical calculation (as most computers still are today), these classic machines were not thought of by their creators as mere number crunchers. They have been viewed since their conception as amplifiers of human thought, what Ed Feigenbaum calls “power tools for the mind.”76

Early success in the 1950s and 1960s with what were thought to be difficult problems, such as proving theorems and playing chess, fueled a romantic optimism that proved short-lived. It was an example of the “90-10″ rule: solving the first 90 percent of a problem often requires only 10 percent of the effort, and though the remaining 10 percent then requires 90 percent of the effort, it generally represents 90 percent of the importance. With the realization in the 1970s that extensive knowledge was required to solve practical problems, and with no easy way of capturing that knowledge, the field gained a needed maturity.