WHEN THINGS START TO THINK | Chapter 8: Bad Words

May 15, 2003

Originally published by Henry Holt and Company 1999. Published on KurzweilAI.net May 15, 2003.

A group from a major computer company once came to the Media Lab to learn about our current research. The day started with their explaining that they wanted to see agent technology. Agents are a fashionable kind of computer program designed to learn users’ preferences and help anticipate and act on their needs. Throughout the day the visitors saw many relevant things, including software to assist with collaboration among groups of people, environments for describing how programs can adapt, and mathematical techniques for finding patterns in user data. I almost fell out of my chair at the wrap-up at the end of the day when they asked when they were going to see agent technology. They had clearly been tasked to acquire some agents, but wouldn’t recognize one if it bit them.

I spent a day consulting with a firm that spends billions of dollars a year on information technology. After working with them for most of the day on methods for analyzing and visualizing large data sets to find hidden structure, a senior executive asked if we could change subjects and talk about “data mining.” He had heard about this great new technique that separates the nuggets of insight from the chaff of data that most companies have. Unfortunately, he had no idea that data mining and data analysis had anything to do with each other, and that in fact that was what we had been doing all day long.

At an industry lunch I found myself seated next to an editor for a major computer magazine. She was writing an article about exciting recent breakthroughs in artificial intelligence, and asked me if I used “neural net technology.” When I told her that I did use nonlinear layered models to approximate functions with many variables, she looked concerned but gave me a second chance and asked if I used “fuzzy logic technology.” After I answered that I did include probabilities in my models to let me account for advance beliefs and subsequent observations, she rolled her eyes and made clear what a burden it was to talk to someone who was so technologically unhip. She was particularly disappointed to see such old-fashioned thinking in the Media Lab.

Any software catalog is full of examples of perfectly unremarkable programs with remarkable descriptions. You can get ordinary games, or games that take advantage of virtual reality technology; regular image compression programs, or special ones that use fractal technology; plain word processors, or new and improved ones with artificial intelligence technology. The mysterious thing is that, other than the descriptions, the capabilities of these new programs are surprisingly similar to those of the old-fashioned ones.

These unfortunate examples share a kind of digitally enhanced semiotic confusion. I’m always suspicious when the word “technology” is injudiciously attached to perfectly good concepts that don’t need it. Adding “technology” confers the authority of, say, a great Victorian steam engine on what might otherwise be a perfectly unremarkable computer program.

If you’re offered a better mousetrap, you can inspect it and evaluate its design improvements. But if you’re offered a better computer program, it is difficult to peer inside and judge how it works. Consequently, the words that are used to describe software can have more influence and less meaning than those used to describe a mousetrap. Because so many people are so unclear about where hardware leaves off and software begins, reasonable people accept unreasonable claims for computer programs. A market persists for software that purports to speed up the processor in a computer, or add memory to a system, even though those attributes are clearly questions of physical resources. It doesn’t require an advanced testing laboratory to recognize that such things can’t (and don’t) work.

These problems are not just a consequence of malicious marketing—even the developers can be misled by their own labels. When I was a graduate student at Cornell doing experiments in the basement of the Physics building, there was a clear sign when someone’s thesis research was in trouble: they made overly nice cases for their instruments. Creating and understanding new capabilities was difficult; creating fancy packages for old capabilities was labor-intensive but guaranteed to succeed. Making an elaborate housing provided the illusion of progress without the risk of failure because nothing new was actually being done. Something similar is happening with bad programs wrapped in elaborate interfaces and descriptions.

There are a number of words that I’ve found to be good indicators of such suspect efforts because they are used so routinely in ways that are innocently or intentionally misleading. Lurking behind terms such as “multimedia,” “virtual reality,” “chaos theory,” “agents,” “neural networks,” and “fuzzy logic” are powerful ideas with fascinating histories and remarkable implications, but their typical use is something less than remarkable. Realization of their promise requires much more than the casual application of a label.

When Yo-Yo was playing our cello you could correctly call what he was doing “interactive multimedia” because a computer was making sounds in response to his inputs, and you might even call it a kind of “virtual reality” because he was using physical sensors to manipulate a virtual instrument. But you probably wouldn’t do either because there was a much more compelling reality: Yo-Yo Ma playing the cello. This is another clue that one should worry about the use of these kinds of technological buzz phrases. A marketing description of the feature list of a Stradivarius could not distinguish it from a lesser instrument, or even from a multimedia PC running a cello simulator. In fact, the PC would come out much better in a list of standard features and available options and upgrades. The Strad, after all, can only make music. The words we usually use to describe computers have a hard time capturing the profound difference between things that work, and things that work very well. A Strad differs from a training violin in lots of apparently inconsequential details that when taken together make all of the difference in the world.

People are often surprised to find that there’s no one in the Media Lab who would say that they’re working on multimedia. When the lab started fifteen years ago, suggesting that computers should be able to speak and listen and see was a radical thing to do, so much so that it was almost grounds for being thrown off campus for not being serious. Now not only are there conferences and journals on multimedia, the best work being done has graduated from academic labs and can be found at your local software store. It was a battle for the last decade to argue that a digital representation lets computers be equally adept at handling text, audio, or video. That’s been won; what matters now is what is done with these capabilities.

Studying multimedia in a place like the Media Lab makes as much sense as studying typewriters in a writers’ colony, or electricity in a Computer Science department. Identifying with the tools deflects attention away from the applications that should motivate and justify them, and from the underlying skills that are needed to create them. The causal ease of invoking “multimedia” to justify any effort that involves sounds or images and computers is one of the reasons that there has been so much bad done in the name of multimedia.

When I fly I’ve found a new reason to look for airsickness bags: the in-flight videos touting new technology. These hyperactive programs manage to simulate turbulence without even needing to leave the ground. They are the video descendents of early desktop publishing. When it became possible to put ten fonts on a page, people did, making so much visual noise. Good designers spend years thinking about how to convey visual information in ways that guide and satisfy the eye, that make it easy to find desired information and progress from a big picture to details. It requires no skill to fill a page with typography; it requires great discipline to put just enough of the right kind of information in the right place to communicate the desired message rather than a printer demo. In the same way, adding audio and video together is easy when you have the right software and hardware, but to be done well requires the wisdom of a film director, editor, cinematographer, and composer combined. Jump cuts look just as bad done with bits or atoms.

Overuse of the possibilities afforded by improving technology has come full circle in a style of visual design that miraculously manages to capture in print the essence of bad multimedia. It hides the content behind a layout of such mind-boggling complexity that, after I admire the designer’s knowledge of the features of their graphics programs, I put the magazine down and pick up a nice book. Such cluttered design is certainly not conducive to distinguishing between what exists and what does not, between what is possible and what is not. And so here again in this kind of writing the labels attached to things take on more importance than they should.

After media became multi, reality became virtual. Displays were put into goggles and sensor gloves were strapped onto hands, allowing the wearer to move through computer-generated worlds. The problem with virtual reality is that it is usually taken to mean a place entirely disconnected from our world. Either you are in a physical reality, or a virtual one. When most people first actually encounter virtual reality, they’re disappointed because it’s not that unlike familiar video games. Although in part this is a consequence of describing simple systems with fancy terms, the analogy with video games goes deeper. In a virtual world there is an environment that responds to your actions, and the same is true of a video game. Virtual reality is no more or less than computers with good sensors and displays running models that can react in real time. That kind of capability is becoming so routine that it doesn’t require a special name.

One of the defining features of early virtual reality was that it was completely immersive; the goal was to make video and audio displays that completely plugged your senses. That’s become less important, because bad virtual reality has reminded people of all of the good features of the world of atoms. Instead of a sharp distinction between the virtual and the physical world, researchers are beginning to merge the best attributes of both by embedding the displays into glasses or walls. Discussions about virtual reality lead to awkward constructions like “real reality” to describe that which is not virtual; it’s much more natural to simply think about reality as something that is presented to you by information in your environment, both logical and physical.

“Chaos theory” is a leading contender for a new paradigm to describe the complexity of that physical environment and bring it into the precise world of computers. I read a recent newspaper article that described the significance of “. . . chaos theory, which studies the disorder of formless matter and infinite space.” Wow. One can’t ask for much more than that. What makes “chaos theory” particularly stand out is that it is generally considered to be a theory only by those who don’t work on it.

The modern study of chaos arguably grew out of Ed Lorenz’s striking discovery at MIT in the 1960s of equations that have solutions that appear to be random. He was using the newly available computers with graphical displays to study the weather. The equations that govern it are much too complex to be solved exactly, so he had the computer find an approximate solution to a simplified model of the motion of the atmosphere. When he plotted the results he thought that he had made a mistake, because the graphs looked like random scribbling. He didn’t believe that his equations could be responsible for such disorder. But, hard as he tried, he couldn’t make the results go away. He eventually concluded that the solution was correct; the problem was with his expectations. He had found that apparently innocuous equations can contain solutions of unimaginable complexity. This raised the striking possibility that weather forecasts are so bad because it’s fundamentally not possible to predict the weather, rather than because the forecasters are not clever enough.

Like all good scientific discoveries, the seeds of Lorenz’s observation can be found much earlier. Around 1600 Johannes Kepler (a devout Lutheran) was trying to explain the observations of the orbits of the planets. His first attempt, inspired by Copernicus, matched them to the diameters of nested regular polyhedra (a pyramid inside a cube . . . ). He published this analysis in the Mysterium Cosmographicum, an elegant and entirely incorrect little book. While he got the explanation wrong, this book did have all of the elements of modern science practice, including a serious comparison between theoretical predictions and experimental observations, and a discussion of the measurement errors. Armed with this experience plus better data he got it right a few years later, publishing a set of three laws that could correctly predict the orbits of the planets. The great triumph of Newton’s theory of gravitation around 1680 was to derive these laws as consequences of a gravitational force acting between the planets and the sun. Newton tried to extend the solution to three bodies (such as the combined earth-moon-sun system), but failed and concluded that the problem might be too hard to solve. This was a matter of more than passing curiosity because it was not known if the solution of the three-body problem was stable, and hence if there was a chance of the earth spiraling into the sun.

Little progress was made on the problem for many years, until around 1890 the French mathematician Henri PoincarĂ© was able to prove that Newton’s hunch was right. PoincarĂ© showed that it was not possible to write down a simple solution to the three-body problem. Lacking computers, he could only suspect what Lorenz was later able to see: the solutions could not be written down because their behavior over time was so complex. He also realized that this complexity would cause the solutions to depend sensitively on any small changes. A tiny nudge to one planet would cause all of the trajectories to completely change.

This behavior is familiar in an unstable system, such as a pencil that is balanced on its point and could fall in any direction. The converse is a stable system, such as the position of the pendulum in a grandfather clock that hangs vertically if the clock is not wound. Poincaré encountered, and Lorenz developed, the insight that both behaviors could effectively occur at the same time. The atmosphere is not in danger of falling over like a pencil, but the flutter of the smallest butterfly can get magnified to eventually change the weather pattern. This balance between divergence and convergence we now call chaos.

Ordinary mathematical techniques fail on chaotic systems, which appear to be random but are governed by simple rules. The discovery of chaos held out the promise that simple explanations might be lurking behind nature’s apparent complexity. Otherwise eminent scientists relaxed their usual critical faculties to embrace the idea. The lure of this possibility led to the development of new methods to recognize and analyze chaos. The most remarkable of these could take almost any measured signal, such as the sound of a faucet dripping, and reveal the behavior of the unseen parts of the system producing the signal. When applied to a dripping faucet this technique did in fact provide a beautiful explanation for the patterns in what had seemed to be just an annoying consequence of leaky washers. After this result, the hunt was on to search for chaos.

Lo and behold, people found it everywhere. It was in the stars, in the weather, in the oceans, in the body, in the markets. Computer programs were used to test for chaos by counting how many variables were needed to describe the observations; a complex signal that could be explained by a small number of variables was the hallmark of chaos. Unfortunately, this method had one annoying feature. When applied to a data set that was too small or too noisy it could erroneously conclude that anything was chaotic. This led to excesses such as the Economist magazine reporting that “Mr. Peters thinks that the S&P 500 index has 2.33 fractal dimensions.” This means that future values of the stock market could be predicted given just three previous values, a recipe for instant wealth if it wasn’t so obviously impossible. Such nonsensical conclusions were accepted on the basis of the apparent authority of these computer programs.

Chaos has come to be associated with the study of anything complex, but in fact the mathematical techniques are directly applicable only to simple systems that appear to be complex. There has proved to be a thin layer between systems that appear to be simple and really are, and those that appear to be complex and really are. The people who work on chaos are separating into two groups, one that studies the exquisite structure in the narrow class of systems where it does apply, and another that looks to use the methods developed from the study of chaos to help understand everything else. This leaves behind the frequently noisy believers in “chaos theory,” inspired but misled by the exciting labels.

They’re matched in enthusiasm by the “agents” camp, proponents of computer programs that learn user preferences and with some autonomy act on their behalf. The comforting images associated with an agent are a traditional English butler, or a favorite pet dog. Both learn to respond to their master’s wishes, even if the master is not aware of expressing them. Nothing would be more satisfying than a digital butler that could fetch. Unfortunately, good help is as hard to find in the digital world as it is in the physical one.

Agents must have a good agent. The widespread coverage of them has done a great job of articulating the vision of what an agent should be able to do, but it’s been less good at covering the reality of what agents can do. Whatever you call it, an agent is still a computer program. To write a good agent program you need to have reasonable solutions to the interpretation of written or spoken language and perhaps video recognition so that it can understand its instructions, routines for searching through large amounts of data to find the relevant pieces of information, cryptographic schemes to manage access to personal information, protocols that allow commerce among agents and the traditional economy, and computer graphics techniques to help make the results intelligible. These are hard problems. Wanting to solve them is admirable, but naming the solution is not the same as obtaining it.

Unlike what my confused industrial visitors thought, an agent is very much part of this world of algorithms and programming rather than a superhuman visitor from a new world. The most successful agents to date have bypassed many of these issues by leveraging human intelligence to mediate interactions that would not happen otherwise. Relatively simple programs can look for people who express similar preferences in some areas such as books or recordings, and then make recommendations based on their previous choices. That’s a great thing to do, helping realize the promise of the Internet to build communities and connections among people instead of isolating them, but then it becomes as much a question of sociology as programming to understand how, where, and why people respond to each other’s choices.

If an agent is ever to reason with the wisdom of a good butler, it’s natural to look to the butler’s brain for insight into how to do this. Compared to a computer, the brain is made out of slow, imperfect components, yet it is remarkably powerful, reliable, and efficient. Unlike a conventional digital computer, it can use continuous analog values, and it takes advantage of an enormous number of simple processing elements working in parallel, the neurons. These are “programmed” by varying the strength of the synaptic connections among them. “Neural networks” are mathematical models inspired by the success of this architecture.

In the 1940s mathematical descriptions were developed for neurons and their connections, suggesting that it might be possible to go further to understand how networks of neurons function. This agenda encountered an apparently insurmountable obstacle in 1969 when Marvin Minsky and Seymour Papert proved that a layer of neurons can implement only the simplest of functions between their inputs and outputs. The strength of their result effectively halted progress until the 1980s when a loophole was found by introducing neurons that are connected only to other neurons, not inputs or outputs. With such a hidden layer it was shown that a network of neurons could represent any function.

Mathematical models traditionally have been based on finding the values of adjustable parameters to most closely match a set of observations. If a hidden layer is used in a neural network, this is no longer possible. It can be shown that there’s no feasible way to choose the best values for the connection strengths. This is analogous to the study of chaos, where the very complexity that makes the equations impossible to solve makes them valuable to use. The behavior of chaotic equations can be understood and applied even if it can’t be exactly predicted. Similarly, it turned out that hidden layers can be used by searching for reasonable weights without trying to find the best ones. Because the networks are so flexible, even a less-than-ideal solution can be far more useful than the exact solution of a less-capable model. As a consequence of this property, neural networks using surprisingly simple search strategies were surprisingly capable.

The combination of some early successes and language that suggests that neural networks work the same way the brain does led to the misleading impression that the problem of making machines think had been solved. People still have to think to use a neural network. The power, and problems, of neural networks were amply demonstrated in a study that I ran at the Santa Fe Institute. It started with a workshop that I attended there, exploring new mathematical techniques for modeling complex systems. The meeting was distressingly anecdotal, full of sweeping claims for new methods but containing little in the way of insight into how they fail or how they are related to what is already known. In exasperation I made a joke and suggested that we should have a data analysis contest. No one laughed, and in short order the Santa Fe Institute and NATO had agreed to support it.

My colleague Andreas Weigend and I selected interesting data sets from many disciplines, giving the changes over time in a currency exchange rate, the brightness of a star, the rhythm of a heartbeat, and so forth. For extra credit we threw in the end of The Art of the Fugue, the incomplete piece that Bach was writing when he died. These data were distributed around the world through the Internet. Researchers were given quantitative questions appropriate to each domain, such as forecasting future values of the series. These provided a way to make comparisons across disciplines independent of the language used to describe any particular technique.

From the responses I learned as much about the sociology of science as I did about the content. Some people told us that our study was a mistake because science is already too competitive and it’s a bad influence to try to make these comparisons; others told us that our study was doomed because it’s impossible to make these kinds of comparisons. Both were saying that their work was not falsifiable. Still others said that our study was the best thing that had happened to the field, because they wanted to end the ambiguity of data analysis and find out which technique was the winner.

One of the problems presented data from a laser that was erratically fluctuating on and off; the task was to predict how this pattern would continue after the end of the data set. Because the laser was chaotic, traditional forecasting methods would not do much better than guessing randomly. Some of the entires that we received were astoundingly good. One of the best used a neural network to forecast the series, and it was convincingly able to predict all of the new behavior that it had not seen in the training data. Here was a compelling demonstration of the power of a neural net.

For comparison, one entry was done by eye, simply guessing what would come next. Not surprisingly, this one did much worse than the neural network. What was a surprise was that it beat some of other entries by just as large a margin. One team spent hours of supercomputer time developing another neural network model; it performed significantly worse than the visual inspection that took just a few moments. The best and the worse neural networks had similar architectures. Nothing about their descriptions would indicate the enormous difference in their performance; that was a consequence of the insight with which the networks were applied.

It should not be too unexpected that apparently similar neural networks can behave so differently—the same is true of real brains. As remarkable as human cognition can be, some people are more insightful than others. Starting with the same hardware, they differ in how they use it. Using a neural network gives machines the same opportunity to make mistakes that people have always enjoyed.

Many experts in neural networks don’t even study neural networks anymore. Neural nets provided the early lessons that started them down the path of using flexible models that learn, but overcoming the liabilities of neural nets has led them to leave behind any presumption of modeling the brain and focus directly on understanding the mathematics of reasoning. The essence of this lies in finding better ways to manage the tension between experience and beliefs.

One new technique on offer to do that is “fuzzy logic.” It is sold as an entirely new kind of reasoning that replaces the sharp binary decisions forced by our Western style of thinking with a more Eastern sense of shades of meaning that can better handle the ambiguity of the real world. In defense of the West, we’ve known for a few centuries how to use probability theory to represent uncertainty. If you force a fuzzy logician to write down the expressions they use, instead of telling you the words they attach to them, it turns out that the expressions are familiar ones with new names for the terms. That itself is not so bad, but what is bad is that such naming deflects attention away from the much better developed study of probability theory.

This danger was on display at a conference I attended on the mathematics of inference. A battle was raging there between the fuzzy logic camp and the old-fashioned probabilists. After repeatedly pushing the fuzzy logicians to show anything at all that they could do that could not be done with regular probability theory, the fuzzy side pulled out their trump card and told the story of a Japanese helicopter controller that didn’t work until fuzzy logic was used. Everyone went home feeling that they had won the argument. Of course a demonstration that something does work is far from proof that it works better than, or even differently than, anything else. What’s unfortunate about this example is that perfectly intelligent people were swayed by the label attached to the program used in the helicopter rather than doing the homework needed to understand how it works and how it relates to what is already known. If they had done so, they might have discovered that the nonfuzzy world has gone on learning ever more interesting and useful things about uncertainty.

Connecting all of these examples is a belief in magic software bullets, bits of code that can solve the hard problems that had stumped the experts who didn’t know about neural networks, or chaos, or agents. It’s all too easy to defer thinking to a seductive computer program. This happens on the biggest scales. At still one more conference on mathematical modeling I sat through a long presentation by someone from the Defense Department on how they are spending billions of dollars a year on developing mathematical models to help them fight wars. He described an elaborate taxonomy of models of models of models. Puzzled, at the end of it I hazarded to put up my hand and ask a question that I thought would show everyone in the room that I had slept through part of the talk (which I had). I wondered whether he had any idea whether his billion-dollar models worked, since it’s not convenient to fight world wars to test them. His answer, roughly translated, was to shrug and say that that’s such a hard question they don’t worry about it. Meanwhile, the mathematicians in the former Soviet Union labored with limited access to computers and had no recourse but to think. As a result a surprising fraction of modern mathematical theory came from the Soviet Union, far out of proportion to its other technical contributions.

Where once we saw farther by standing on the shoulders of our predecessors, in far too many cases we now see less by standing on our predecessors’ toes. You can’t be interdisciplinary without the disciplines, and without discipline. Each of the problematical terms I’ve discussed is associated with a very good idea. In the life cycle of an idea, there is a time to tenderly nurse the underlying spark of new insight, and a time for it to grow up and face hard questions about how it relates to what is already known, how it generalizes, and how it can be used. Even if there aren’t good answers, these ideas can still be valuable for how they influence people to work on problems that do lead to answers. The Information Age is now of an age that deserves the same kind of healthy skepticism applied to the world of bits that we routinely expect in the world of atoms.

WHEN THINGS START TO THINK by Neil Gershenfeld. ©1998 by Neil A. Gershenfeld. Reprinted by arrangement with Henry Holt and Company, LLC.