Another Formula for Intelligence: The Neural Net Paradigm
August 6, 2001 by Ray Kurzweil
The neural net approach to artificial intelligence explained, written for “The Futurecast,” a monthly column in the Library Journal.
Originally published October 1992. Published on KurzweilAI.net August 6, 2001.
Consider the task of the lowly neuron. Signals stream in from the thousands of branches of its dendrites, each representing a message from another neuron. Some of the messages are subtle, others are urgent. The mission of the neuron is to render a judgment. It does this by summarizing the thousands of chaotic messages it receives into a coherent answer that reflects its particular view of the world. Its conclusion is expressed in rather simple terms: it either sends its characteristic threshold message or it doesn’t.
The human brain has been called the most complex object in the world. It is certainly the most complex that we are aware of (although the brains of certain other mammals such as dolphins and elephants are arguably more extensive). It is the organ of our bodies that gives rise to mind, to consciousness, to our awareness of our existence. Within the brain, there is astonishing diversity of function and form. Yet despite the broad variety of neuronal structure and purpose, the computational paradigm used by the brain is remarkably uniform.
The typical neuron has a soma or cell body, thread-like extensions called dendrites, and a single projection called an axon. The dendrites, which act as the inputs to the neuron, generally look like trees branching into finer and finer structures with the trunks extending from the neuron cell body. The “leafs” of these input trees terminate at synapses (gaps) that receive messages from other neureons. The output of a neuron, called the axon, also looks like a tree (with a long trunk), which may also branch into finer and finer structures. The “leafs” of the output axon tree also terminate at synapses that communicate with the inputs (terminal dendrites) of other neurons. The synapse is a structure that allows a message to pass from a terminal leaf of the (output) axon tree of one neuron to a terminal leaf of an (input) dendrite tree of some other neuron.
Each nerve fiber (dendrite or axon) is capable of transmitting signals or packets of information called action potentials. These signals are manifest in electrical potentials of about 100 millivolts that last for a millisecond and result from the actual movement of charged sodium ions across the fiber’s membrane. The fiber does not make a good electrical wire as its resistance is too high, so the signal generally dies within a millimeter or two. Thus the potential must be recreated frequently along the way by a series of electro-chemical processes. The speed of the signal is, therefore, quite slow – no more than 100 meters per second, which is a million times slower than the electrical signals used in a computer.
The communication of information across the synapse between an axon (output) branch of one neuron and a dendrite (input) branch of another neuron has been the subject of considerable research, in part because it is here that most psychotropic drugs have their influence. The transfer of an action potential across a synapse is mediated by a variety of chemical messengers called neurotransmitters, of which close to 50 have been identified. Cocaine, for example, facilitates (at Ieast at first) the transmission of dopamine. Interestingly, many antipsychotic drugs have the opposite effect. Chronic use of cocaine, however, results in the total disruption of dopamine absorption, which influences the perception of pleasure as well as other vital processes. Prozac, a popular antidepressant, intensifies the effect of serotonin, another key neurotransmitter. The soma acts as a computer, but not a digital one. It is essentially an analog computer, integrating (adding and subtracting over time) the signals from its dendrite branches. The signals vary in strength and are given different weights (i.e., levels of importance) by the neuron’s analog computer. To this sum, the neuron applies a “nonlinearity,” a thresholding function that is a key to the ability of this system to produce intelligence. The purpose of a neuron is to destroy information. Indeed the selective destruction of information is what intelligence is all about. Artificial intelligence (AI) pioneer Ed Feigenbaum says that “Knowledge is not the same as information; knowledge is information that has been pared and shaped.” Our eyes alone provide the equivalent of 50 billion bits of data every second. Clearly, we could not begin to make sense of such a vast river of information unless we select very carefully certain abstractions that warrant our attention.
A typical neuron receives information from hundreds or thousands of dendritic branches. Each of these inputs reflects either a datum of sensory input or the judgment of another neuron. Essentially, the neuron waits until the sum of its weighted inputs exceeds its characteristic threshold level, at which point its electrical potential explodes by sending an action potential down its axon. It thus summarizes these hundreds or thousands of analog (continuously variable) values into its own judgment, usually expressed as a single bit of information (yes or no). Each layer of neural processing exhibits the ability to extract increasingly abstract levels of information. In visual processing, for example, David Hubel and Tanten Diesel of the Harvard Medical School have discovered specialized cells that are adjacent to the retinal photoreceptor “cone” cells that are “wired” to detect “edges” in visual images. These in turn are summarized by the next layer of neurons that detect lines and curves in certain orientations. After several dozen more layers of abstraction, there are neurons that respond to the percept of particular faces.
Key to this process is the concept of feedback, without which a net of neurons would be unable to learn. The relative strength of each synapse may be increased or decreased to better reflect the desired response. There appear to be specific modulatory neurons that facilitate this process. There are a variety of neural learning techniques that have been hyped, and the exact mechanisms involved is an area of particularly intense contemporary research. Regardless of the means, however, the performance of each neuron is constantly being monitored, and the synaptic strengths are continually readjusted to improve the overall performance of each layer of the network. Neurons can also grow new dendritic and axon branches and create new synapses with other neurons. You may be doing this right now as you read this month’s Futurecast.
While the electrochemical implementation of the neural process is complex, the basic computational paradigm is simple and surprisingly powerfuL In essence, a neuron takes the outputs of other neurons, weights each of these values according to a set of changing synaptic weights, takes the sum of these weighted values, and then compares the sum to a threshold value. It then communicates its fundamentally yes or no judgment to other neurons. Periodically, the synaptic weights are adjusted to improve the ability of a net of neurons to perform their intended function. The output of each neuron is, therefore, a very simple summary of its inputs. Through elaborate timed cascades and hierarchies of such simplification functions, percepts and concepts are realized and decisions made.
As humans form societies, we see the same process of simplification and summarization in almost every activity. Consider an election: As we consider each candidate, each of us weighs many complex factors but finally renders a very simple judgment – yes or no on each candidate. We may not be consciously aware of how we weighted each factor, or even what all of the factors are, yet we nonetheless are able to come to a decision. Each of these individual judgments flow into thousands or millions of other similar judgements. All of this enormous complexity is finally summarized by a few bits of information: each candidate either wins or loses. This decision then flows into yet other derisions.
There are three aspects of this decision process that are counterintuitive. First, this type of decision-making process is fundamental to all thought. Our ability to make sense of our senses, to navigate our lives at every level depends on a massive process of decision-making, where every decision results in the destruction of almost all of the information that went into the decision. Only a very simple summary of the information that went into each judgment is passed on to the next level. It is not just our conscious decisions that use this process, but all levels of our thinking. Second, very little of our decision-making is what we might regard as “rational,” but rather the result of a panoply of inherently irrational judgments. Third, the vast majority of our decisions are not conscious. Even our nominally conscious decisions are heavily influenced by millions of preconscious judgments.
Re-creating human intelligence
In addition to helping us understand human intelligence, insights into human neurobiology have provided an approach to recreating intelligence in a computer that is radically different from the logic-based approaches that have been dominant in the early history of computation. Such “neural nets” use mathematical models of neurons with simulated dendrites, axons, and synapses.
A particularly effective contemporary application of neural nets is the automatic recognition of hand-printed characters. Letters and numbers written by hand are often sloppy and unpredictable, and thus it is difficult for the more traditional rule-based approaches to deal with the highly variable forms created by human handwriting. Computer-based neural nets do not need to be provided with rules at all; they simply need to be exposed to tens of thousands of examples of the patterns to be recognized. Then through extensive trial and error, they are capable of developing their own rules.
Pattern recognition systems based on neural nets learn the same way that people do – slowly. Previously, designers of pattern recognition systems had to provide handcrafted algorithms that defined the patterns to be recognized. For example, the designer of a character-recognition system might program in the fact that a capital D consists of a straight vertical line connected to the right half of a circle. The human designer of a neural-netbased character-recognition system, on the other hand, does not need to be concerned with such rules nor does the designer even need to define the types of patterns that are expected. The neural net system is able to discover these rules for itself. Instead, the task of the neural net designer is similar to that of a kindergarten teacher patiently providing thousands of examples from which the system can learn and providing it feedback as to when it is right and wrong
The designer also needs to set up the initial neural connections with the right topology. The wiring of the human brain is not random, but is exquisitely set up to facilitate the types of patterns that humans deal with – objects, faces, language, music, and the myriad other skills that make up human intelligence.
Neural net technology has been employed in an increasing variety of tasks ranging from recognizing human speech to making financial credit decisions. In the latter case, designers recently allowed a computer-based neural net to learn by “watching” the decisions of its human counterparts. Apparently, it learned its lessons too well, discovering not only the proper methods of credit authorization, but also mastering its human teachers’ unconscious (or at least undocumented) prejudices. It apparently would refuse credit to anyone from a disadvantaged neighborhood regardless of the applicant’s credit history.
In the past two months we have examined two radically different approaches to capturing intelligence in a machine. The recursive paradigm, examined in last month’s Futurecast, has been a popular and successful technique using conventional computer architectures. The human brain is not very effective at using the recursive paradigm, but relies instead on the neural net paradigm described above. We are now discovering that computers can make effective use of both approaches.
In next month’s column, we will examine what it would take to create a neural net with the capacity of the human brain, what such a system might be capable of achieving, and when that will be feasible.
Reprinted with permission from Library Journal, October 1992. Copyright © 1992, Reed Elsevier, USA