Seeing Through the Window

July 27, 2001 by Neil Gershenfeld

What form will new human/computer interfaces take? Neil Gershenfeld discusses the past, present and future of how we interact with computers.

Originally published 1999 in the book When Things Start To Think, by Neil Gershenfeld. Published on KurzweilAI.net July 27, 2001.

I vividly recall a particular car trip from my childhood because it was when I invented the laptop computer. I had seen early teletype terminals; on this trip I accidentally opened a book turned on its side and realized that there was room on the lower page for a small typewriter keyboard, and on the upper page for a small display screen. I didn’t have a clue how to make such a thing, or what I would do with it, but I knew that I had to have one. I had earlier invented a new technique for untying shoes, by pulling on the ends of the laces; I was puzzled and suspicious when my parents claimed prior knowledge of my idea. It would take me many more years to discover that Alan Kay had anticipated my design for the laptop and at that time was really inventing the portable personal computer at Xerox’s Palo Alto Research Center (PARC).

Despite the current practice of putting the best laptops in the hands of business executives rather than children, my early desire to use a laptop is much closer to Alan’s reasons for creating one. Alan’s project was indirectly inspired by the work of the Swiss psychologist Jean Piaget, who from the 1920s onward spent years and years studying children. He came to the conclusion that what adults see as undirected play is actually a very structured activity. Children work in a very real sense as little scientists, continually positing and testing theories for how the world works. Through their endless interactive experiments with the things around them, they learn first how the physical world works, and then how the world of ideas works. The crucial implication of Piaget’s insight is that learning cannot be restricted to classroom hours, and cannot be encoded in lesson plans; it is a process that is enabled by children’s interaction with their environment.

Seymour Papert, after working with Piaget, brought these ideas to MIT in the 1960s. He realized that the minicomputers just becoming available to researchers might provide the ultimate sandbox for children. While the rest of the world was developing programming languages for accountants and engineers, Seymour and his collaborators created LOGO for children. This was a language that let kids express abstract programming constructs in simple intuitive terms, and best of all it was interfaced to physical objects so that programs could move things outside of the computer as well as inside it. The first one was a robot “turtle” that could roll around under control of the computer, moving a pen to make drawings.

Infected by the meme of interactive technology for children, Alan Kay carried the idea to the West Coast, to Xerox’s Palo Alto Research Center. In the 1970s, he sought to create what he called a Dynabook, a portable personal knowledge navigator shaped like a notebook, a fantasy amplifier. The result was most of the familiar elements of personal computing.

Unlike early programming languages that required a specification of a precise sequence of steps to be executed, modern object oriented languages can express more complex relationships among abstract objects. The first object-oriented programming language was Smalltalk, invented by Alan to let children play as easily with symbolic worlds as they do with physical ones. He then added interface components that were being developed by Doug Engelbart up the road at the Stanford Research Institute.

Doug was a radar engineer in World War II. He realized that a computer could be more like a radar console than a typewriter, interactively drawing graphics, controlled by an assortment of knobs and levers. Picking up a theme that had been articulated by Vannevar Bush (the person most responsible for the government’s support of scientific research during and after the war) in 1945 with his proposal for a mechanical extender of human memory called a Memex, Doug understood that such a machine could help people navigate through the increasingly overwhelming world of information. His colleagues thought that he was nuts.

Computers were specialized machines used for batch processing, not interactive personal appliances. Fortunately, Engelbart was able to attract enough funding to set up a laboratory around the heretical notion of studying how people and computers might better interact. These ideas had a coming out in a rather theatrical demo he staged in San Francisco in 1968, showing what we would now recognize as an interactive computer with a mouse and multiple windows on a screen.

In 1974 these elements came together in the Xerox Alto prototype, and reached the market in Xerox’s Star. The enormous influence of this computer was matched by its enormous price tag, about $50,000. This was a personal computer that only big corporations could afford. Windows and mice finally became widely available and affordable in Apple’s Macintosh, inspired by Steve Jobs’ visit to PARC in 1979, and the rest of personal computing caught up in 1990 when Microsoft released Windows 3.0.

The prevailing paradigm for how people use computers hasn’t really changed since Englebart’s show in 1968. Computers have proliferated, their performance has improved, but we still organize information in windows and manipulate it with a mouse. For years the next big interface has been debated. There is a community that studies such things, called Human-Computer Interactions. To give you an idea of the low level of that discussion, one of the most thoughtful HCI researchers, Bill Buxton (chief scientist at Silicon Graphics), is known for the insight that people have two hands. A mouse forces you to manipulate things with one hand alone; Bill develops interfaces that can use both hands.

A perennial contender on the short list for the next big interface is speech recognition, promising to let us talk to our computers as naturally as we talk to each other. Appealing as that is, it has a few serious problems. It would be tiring if we had to spend the day speaking continuously to get anything done, and it would be intrusive if our conversations with other people had to be punctuated by our conversations with our machines. Most seriously, even if speech recognition systems worked perfectly (and they don’t), the result is no better than if the commands had been typed. So much of the frustration in using a computer is not the effort to enter the commands, it’s figuring out how to tell it to do what you want, or trying to interpret just what it has done. Speech is a piece of the puzzle, but it doesn’t address the fundamental mysteries confronting most computer users.

A dream interface has always been dreams, using mind control to direct a computer. There is now serious work being done on making machines that can read minds. One technique used is magnetoencephalography (MEG), which places sensitive detectors of magnetic fields around a head and measures the tiny neural currents flowing in the brain. Another technique, functional magnetic resonance imaging, uses MRI to make a 3D map of chemical distributions in the brain to locate where metabolic activity is happening. Both of these can, under ideal conditions, deduce something about what is being thought, such as distinguishing between listening to music and looking at art, or moving one hand versus the other. The problem that both struggle with is that the brain’s internal representation is not designed for external consumption.

Early programmers did a crude form of MEG by placing a radio near a computer; the pattern of static could reveal when a program got stuck in a loop. But as soon as video displays came along it became much easier for the computer to present the information in a meaningful form, showing just what it was doing. In theory the same information could be deduced by measuring all of the voltages on all of the leads of the chips; in practice this is done only by hardware manufacturers in testing new systems, and it takes weeks of effort.

Similarly, things that are hard to measure inside a person are simple to recognize on the outside. For example, hold your finger up and wiggle it back and forth. You’ve just performed a brain control task that the Air Force has spent a great deal of time and money trying to replicate. They’ve built a cockpit that lets a pilot control the roll angle by thinking; trained operators on a good day can slowly tilt it from side to side. They’re a long way from flying a plane that way.

In fact, a great deal of the work in developing thought interfaces is actually closer to wiggling your finger. It’s much easier to accidentally measure artifacts that come from muscle tension in your forehead or scalp than it is to record signals from deep in the brain. Instead of trying to teach people to do the equivalent of wiggling their ears, it’s easier to use the parts of our bodies that already come wired for us to interact with the world.

Another leading contender for the next big interface is 3D graphics. Our world is three-dimensional; why limit the screen to two dimensions? With advances in the speed of graphical processors it is becoming possible to render 3D scenes as quickly as 2D windows are now drawn. A 3D desktop could present the files in a computer as the drawers of a file cabinet or as a shelf of books, making browsing more intuitive. If you’re willing to wear special glasses, the 3D illusion can be quite convincing.

A 3D display can even be more than an illusion. My colleague Steve Benton invented the reflection holograms on your credit cards; his group is now developing real-time holographic video. A computer calculates the light that would be reflected from a three-dimensional object, and modulates a laser beam to produce exactly that. Instead of tricking the eyes by using separate displays to produce an illusion of depth, his display actually creates the exact light pattern that the synthetic object would reflect.

Steve’s system is a technological tour de force, the realization of a long-standing dream in the display community. It’s also slightly disappointing to many people who see it, because a holographic car doesn’t look as good as a real car. The problem is that reality is just too good. The eye has the equivalent of many thousands of lines of resolution, and a refresh rate of milliseconds. In the physical world there’s no delay between moving an object and seeing a new perspective. Steve may someday be able to match those specifications with holographic video, but it’s a daunting challenge.

Instead of struggling to create a computer world that can replace our physical world, there’s an alternative: augment it. Embrace the means of interaction that we’ve spent eons perfecting as a species, and enhance them with digital content.

Consider Doug Engelbart’s influential mouse. It is a two-dimensional controller that can be moved left and right, forward and backward, and intent is signaled by pressing it. It was preceded by a few centuries by another two-dimensional controller, a violin bow. That, too, is moved left and right, forward and backward, and intent is communicated by pressing it. In this sense the bow and mouse are very similar. On the other hand, while a good mouse might cost $10, a good bow can cost $10,000. It takes a few moments to learn to use a mouse, and a lifetime to learn to use a bow. Why would anyone prefer the bow?

Because it lets them do so much more. Consider the differences between the bow technique and the mouse technique:

Bow Technique Mouse Technique
Sul ponticello (bowing close to the bridge) Click
Spiccato (dropping the bow) double click
Martelé (forcefully releasing the stroke) Drag
Jeté (bouncing the bow)
Tremolo (moving back and forth repeatedly) 
Sul tasto (bowing over the fingerboard) 
Arpeggio (bouncing on broken chords) 
Col legno (striking with the stick) 
Viotti (unaccented then accented note)
Staccato (many martelé notes in one stroke) 
Staccato volante (slight spring during rapid staccato) 
Détaché (vigorous articulated stroke) 
Legato (smooth stroke up or down) 
Sautillé (rapid strike in middle of bow) 
Louré (separated slurred notes) 
Ondulé (tremolo between two strings) 

There’s much more to the bow than a casual marketing list of features might convey. Its exquisite physical construction lets the player perform a much richer control task, relying on the intimate connection between the dynamics of the bow and the tactile interface to the hand manipulating and sensing its motion. Compare that nuance to a mouse, which can be used perfectly well while wearing mittens.

When we did the cello project I didn’t want to ask Yo-Yo to give up this marvelous interface; I retained the bow and instead asked the computer to respond to it. Afterward, we found that the sensor I developed to track the bow could respond to a hand without the bow. This kind of artifact is apparent any time a radio makes static when you walk by it, and was used back in the 1930s by the Russian inventor Lev Termen in his Theremin, the musical staple of science-fiction movies that makes eerie sounds in response to a player waving their arms in front of it.

My student Josh Smith and I found that lurking behind this behavior was a beautiful mathematical problem: given the charges measured on two-dimensional electrodes, what is the three-dimensional distribution of material that produced it? As we made headway with the problem we found that we could make what looks like an ordinary table, but that has electrodes in it that create a weak electric field that can find the location of a hand above it. It’s completely unobtrusive, and responds to the smallest motions a person can make (millimeters) as quickly as they can make them (milliseconds). Now we don’t need to clutter the desk with a rodent; the interface can disappear into the furniture. There’s no need to look for a mouse since you always know where to find your hand.

The circuit board that we developed to make these measurements ended up being call a “Fish,” because fish swim in 3D instead of mice crawling in 2D, and some fish that live in murky waters use electric fields to detect objects in their vicinity just as we were rediscovering how to do it. In retrospect, it’s surprising that it has taken so long for such an exquisite biological sense to get used for computer interfaces. There’s been an anthropomorphic tendency to assume that a computer’s senses should match our own.

We had trouble keeping the Fish boards on hand because they would be carried off around the Media Lab by students who wanted to build physical interfaces. More recently, the students have been acquiring as many radio-frequency identification (RFID) chips as they can get their hands on. These are tiny processors, small enough even to be swallowed, that are powered by an external field that can also exchange data with them. They’re currently used in niche applications, such as tracking laboratory animals, or in the key-chain tags that enable us to pump gas without using a credit card. The students use them everywhere else. They make coffee cups that can tell the coffeemaker how you like your coffee, shoes that can tell a doorknob who you are, and mouse pads that can read a Web URL from an object placed on it.

You can think of this as a kind of digital shadow. Right now objects live either in the physical world or as icons on a computer screen. User interface designers still debate whether icons that appear to be three-dimensional are better than ones that look two-dimensional. Instead, the icons can really become three-dimensional; physical objects can have logical behavior associated with them. A business card should contain an address, but also summon a Web page if placed near a Web browser. A pen should write in normal ink, but also remember what it writes so that the information can be recalled later in a computer, and it should serve as a stylus to control that computer. A house key can also serve as a cryptographic key. Each of these things has a useful physical function as well as a digital one.

My colleague Hiroshi Ishii has a group of industrial designers, graphical designers, and user interface designers studying how to build such new kinds of environmental interfaces. A recurring theme is that interaction should happen in the context that you, rather than the computer, find meaningful. They use video projectors so that tables and floors and walls can show relevant information; since Hiroshi is such a good Ping-Pong player, one of the first examples was a Ping-Pong table that displayed the ball’s trajectory in a fast-moving game by connecting sensors in the table to a video projector aimed down at the table. His student John Underkoffler notes that a lamp is a one-bit display that can be either on or off; John is replacing lightbulbs with combinations of computer video projectors and cameras so that the light can illuminate ideas as well as spaces.

Many of the most interesting displays they use are barely perceptible, such as a room for managing their computer network that maps the traffic into ambient sounds and visual cues. A soothing breeze indicates that all is well; the sights and sounds of a thunderstorm is a sign of an impending disaster that needs immediate attention. This information about their computer network is always available, but never demands direct attention unless there is a problem.

Taken together, ambient displays, tagged objects, and remote sensing of people have a simple interpretation: the computer as a distinguishable object disappears. Instead of a fixed display, keyboard, and mouse, the things around us become the means we use to interact with electronic information as well as the physical world. Today’s battles between competing computer operating systems and hardware platforms will literally vanish into the woodwork as the diversity of the physical world makes control of the desktop less relevant.

This is really no more than Piaget’s original premise of learning through manipulation, filtered through Papert and Kay. We’ve gotten stuck at the developmental stage of early infants who use one hand to point at things in their world, a decidedly small subset of human experience. Things we do well rely on all of our senses.

Children, of course, understand this. The first lesson that any technologist bringing computers into a classroom gets taught by the kids is that they don’t want to sit still in front of a tube. They want to play, in groups and alone, wherever their fancy takes them. The computer has to tag along if it is to participate. This is why Mitch Resnick, who has carried on Seymour’s tradition at the Media Lab, has worked so hard to squeeze a computer into a Lego brick. These bring the malleability of computing to the interactivity of a Lego set.

Just as Alan’s computer for kids was quickly taken over by the grown-ups, Lego has been finding that adults are as interested as kids in their smart bricks. There’s no end to the creativity that’s found expression through them; my favorite is a descendent of the old LOGO turtle, a copier made from a Lego car that drives over a page with a light sensor and then trails a pen to draw a copy of the page.

A window is actually an apt metaphor for how we use computers now. It is a barrier between what is inside and what is outside. While that can be useful at times (such as keeping bugs where they belong), it’s confining to stay behind it. Windows also open to let fresh air in and let people out.

All along the coming interface paradigm has been apparent. The mistake was to assume that a computer interface happens between a person sitting at a desk and a computer sitting on the desk. We didn’t just miss the forest for the trees, we missed the earth and the sky and everything else. The world is the next interface.