How to Build a Virtual Human
October 20, 2003 by Peter Plantec
Virtual Humans is the first book with instructions on designing a “V-human,” or synthetic person. Using the programs on the included CD, you can create animated computer characters who can speak, dialogue intelligently, show facial emotions, have a personality and life story, and be used in real business projects. These excerpts explain how to get started.
To be published in Virtual Humans, AMACOM, November 2003. Published on KurzweilAI.net October 20, 2003.
About 30% of building a virtual human is in the engine. A good engine will make it easy for you to create a believable personality. It provides functions that allow things like handling complex sentences, bringing up the past and learning better responses if one doesn’t work. But in the end, it’s your artistry that gives the entity its charm.
There are many natural language approaches that can handle the job. Simple pattern matching engines are the least sophisticated and most useful of them all. With the rash of recent interest, I’m not going to pretend I know all the nuances of all the engines out there. Instead, I’ll concentrate on using simple software to build complex personalities. Together we will build a clever virtual person using a mind engine kindly supplied by Yapanda Intelligence, Inc. of Chickasha Oklahoma. I selected this one because it can drive a real-time 3D head animation with lip-synch. Nevertheless, the basic steps in creating a virtual personality are platform independent.
I’ve included some additional engines to play with. The most powerful is ALICE. She’s an implementation of Artificial Intelligence Markup Language (AIML). Alice source code is available to those of you who want to modify it and build your own Virtual Human engine, adding your own special features. I’ve also included a copy of Jacco Bikker’s WinAlice for PC users. It demonstrates some unique features such as the ability to bring up ancient history and to learn new responses from you.
I’ll talk more about the actual engines in chapter three. But it’s important to realize that the software you use to build your virtual human is just a tool for expressing your artistry.
The most important and least understood part of virtual humans — their personalities is our focus. We are going to have some serious fun. Let’s look at some uses for virtual people.
Good For Business
From a business perspective virtual humans with a personality are a major boon. Imagine a person signing onto your web page. There’s already a cookie that contains significant information about them, gathered by your virtual host on the guest’s first visit. The encounter might go a bit like this:
Host “Hey, Joanne, Its nice to see you again.” <smile>
Joanne: “You remember me?”
Host “Of course I do. But it’s been a while. I missed you.” <expression>
Joanne “Sorry about that, I’ve been really busy.”
Host “So did you read ‘The Age of Spiritual Machines?’
Joanne “Yeah, it was really interesting. <beat> Are you one of them?”
Host “Not yet, I’m afraid, but I’m working on it.<beat> Before I forget, you should know about Greg Stock’ new book on how to live to be 200 plus years old!”
Joanne “I read his last book and liked it. Can you send me a copy?”
Host “Sure, we have it in stock <grin>. Same charge, same place?
Joanne “Yup. Also, do you have any books on Freestyle Landscape Quilting?
Host “I’ll check. <beat> Hold on a few more seconds. Okay, I found two…..
And so forth. You can see that Virtual humans bring back that personal touch so sorely missing in commerce today. Believe it or not, I’ve observed people from every level of sophistication and background respond positively to personal attention from a Virtual Human. It feels good.
Your marketing software can be made to generate marketing variables that can be fed to your virtual human host: Joanne’s buying patterns, personal information like her date of birth etc. Trust is a big issue, so such data must be handled with respect for the client and used in clever ways. Imagine when Joanne comes online within a week of her birthday and Host sings happy birthday to her. Hokey? Yes. Appealing, you bet. I’ve also discovered that many people tolerate hokey behavior from V-people. It’s a bit like the ways we tolerate…even appreciate the squash and stretch exaggeration in animated film characters. Of course Host would not want to sing happy birthday to every customer. She has to know how to tell which is which. Later in the book will look into using unobtrusive personality assessment to provide those cues. This is one of the most important and most neglected tools you have. You’ll see why later.
An advantage of rule based approaches is that you can have multiple sets of rules, each one with responses specifically honed to a specific task or person or language. For example when Joanne logs in, her cookie can initiate the uploading of a rule database tailored specifically to her general personality and buying patterns. That means that when a rule triggers, it will respond in a way likely to make Jonnie comfortable while meeting her needs. Next a person from Korea logs on and the host switches to a Korean intelligencebase, greeting the client in that language. One well designed host can handle orders in more than 20 languages. This clearly presents opportunities for small companies to expand internationally.
Depending on your type of business or usage, Virtual Human needs will vary. For example, voice-only virtual humans are already very active in phone information and ordering systems. They don’t have much personality yet, but we’re going to work on that. In fact there are a number of different types of virtual humans and we’ll be building one up from the simplest to one of the more complex with a 3D animated talking head. By taking it step by step you’ll be amazed at your own ability to master Virtual Human design.
A good Virtual Human should be able to cope with language. Changing language should be as easy as switching databases and voice engines. Monica Lamb, a Native American scientist and V-person developer has used Alice to build a V-person that teaches and speaks Mohawk.
At a minimum, your V-person will be able to handle general conversational input by voice or keyboard, parse that input to arrive at appropriate behaviors, and output behavior as text or speech, on-screen information, and/or machine commands to software or external devices. It should also have a face display capable of at least minimal emotional expression such as smile, frown and neutral. I prefer a 3D face capable of complex emotional expression that is part of the communication system. This is a tall order, but I believe we can handle it. Here’s and interesting example of how one creative company has used this technology in a mechanical robot:
Redgate Technologies is a company that thrives on invention. They became interested in Natural Language Processing (NLP) early on. They had invented a new chip technology to monitor and control complex technical systems. NLP was useful for interpreting the complex codes generated by their chips. Just for fun, they expanded their NLP engine to represent several personalities. They quickly discovered that a virtual human hooked into their system became a super-capable assistant to a human supervisor. Imagine one on a space station, keeping track of all mechanical systems and keeping the inhabitants company with casual conversation. For luck we won’t name her HAL.
A wonderful example of this V-person species is Redgate’s Sarha. She’s an innovative virtual human interface for industrial monitoring and control. Sarha stands for “Smart Anthropomorphic Robotic Hybrid Agent.” Redgate has used NLP pattern matching to monitor an entire industrial complex. The Virtual Human system they devised sends out queries to specialized monitoring modules using the special Redgate chips. She then reads and interprets the encoded feedback in spoken English, issuing warnings when conditions warrant. She can also take emergency action on her own, if necessary. Her supervisor communicates with her in spoken English, asking her to start processes or check specific conditions. In a demonstration of Sarha’s application to home security, she reported “Anthony, someone left the garage door open.” Anthony replied “Close it for me will you please, Sarha?” And of course she does.
The thing I like most about Sarha is her personality. She makes personal comments; even chides her operator, whom she knows by name. As a demonstration, Sarha was installed into a fully robotic interface that could move around, point to objects and complain about and avoid objects in her path. She was linked by microwave to a control computer she used to monitor her charges. She even gave a brief talk on those special chips Redgate designed to transmit monitoring data back to her. She reached into a bowl, pulled out a chip, pointed at it with a metal finger and started her spiel. Later she took questions. All the while she was monitoring various systems. She even brought on-line, a loud monster generator in another room during the demonstration.
Perhaps one of the most important applications for Virtual Human technology is in teaching. I’ve found that young people have trust issues with the educational system. I can’t blame them when administrators waste millions on bad decisions but there aren’t enough books to go around. Virtual teacher’s seem separated from all this. It’s hard to attribute ulterior motives to an animated character, even if she is smart and talkative and knows you by name. Properly scripted, a V-teacher can get to know a student on a personal basis. The real human teacher can feed her personal tidbits she can bring up during a lesson:
“So Bill, is it true you threw the winning touchdown in Saturday’s game?”
“Yeah, how’d you know about that?”
” Hey, I keep on top of things. Congratulations. Now let’s teach you how to estimate the diameter of an oleic acid molecule.
Young children can be fascinated by virtual people. I got a call from a retired engineer from rural New Mexico. He had spent a lot of time tweaking the voice input on his V-person so that she would understand his very bright 3 year old grand daughter, and had a story to tell me. He’d been remarkably successful and the little girl spent hours in happy conversation with her virtual friend. One evening a few neighbors came by to play Canasta. While they were playing, the little girl came into the adjoining room and fired up her computer. In moments an animated conversation ensued. One of the neighbors, a devout fundamentalist Christian became terrified and insisted he smash the girl’s computer immediately. It was inhabited by the devil. He refused of course. He told me he’d been using the virtual character to teach his grand daughter everything from her ABCs to simple math. I gave him some unpublished information on how to get her to record the granddaughter’s responses to questions, so he could check on them later.
The point is, in creative hands virtual humans already have enormous potential and the platforms are constantly improving.
Blending art, technology and a little psychology allows us to take a functional leap, decades ahead of pure artificial intelligence. Although the simple VH software of today will eventually be replaced by highly sophisticated neural nets or entirely new kinds of computing, it will be a long time before they’ll have unique human like personalities…if ever. Meanwhile let’s give the evolution of technology a kick in the butt by building really smart, personable virtual people today.
Because creating a believable synthetic personality is more of an art than science, it’s important that we get a feel for how we humans handle our conscious lives. It’s part philosophy, part psychology and believe it or not, part quantum physics. We’ll start by comparing people and computers, with out getting to philosophically crazed. Any discussion of the human mind must consider consciousness. It’s a danger zone and I already know the discussions to follow will dump me smack into the boiling kettle. I’ll walk you through the important parts. Disagree and send me nice email if you like. Coming up in chapter two we’ll explore the nature of consciousness and why it’s an essential consideration in virtual human design.
Synthespians: Virtual Acting (Chapter 13)
with Ed Hooks
Virtual people have to convince us they have wheels spinning inside. They do, of course, have electrons spinning in service of the plot, but if they don’t show it on their faces, we just don’t buy it. We’re used to seeing people think. It’s true; thought is conveyed through action.
Although I’m remarkably opinionated about acting in animation, I’m not a certified expert on the subject–Ed Hooks is. He teaches acting classes for animators internationally, and has held workshops for companies such as Disney Animation (Sydney), Tippett Studio (Berkeley), Microsoft (Redmond, Washington), Electronic Arts (Los Angeles), BioWare (Edmonton, Canada), and PDI (Redwood City, California). Among his five books, Acting for Animators: The Complete Guide to Performance Animation. , Heinemann; Revised edition (September 2003) has been a major hit.
The Seven Essential Concepts in Face Acting
The following concepts are interpretations of Ed Hooks’ "Seven
Essential Acting Concepts." We’ve adapted them here to focus
on the V-people and their faces.
1. The face expresses thoughts beneath. The brain, real or artificial, is the most alive part of us. Thinking, awareness, and reasoning are active processes that affect what’s on our face. Emotion happens as a result of thinking. Because these characters don’t have a natural link between thinking and facial expression, your job as animator is to create those links. In effect, you want your synthetic brain to emulate recognizable human cognition on the face, which leads to the illusion of real and appropriate emotions.
2. Acting is reacting. Every facial expression is a reaction to something. Even the slightest head and hand movement in reaction to what’s happening can be most convincing. If the character tilts its head as you begin to speak to it, or nods on occasion in agreement, you get the distinct feeling of a living person paying attention. A double take shows surprise. Because you have very few body parts to work with, you have a superb challenge in front of you.
3. Know your character’s objective. Your character is never static. He is always moving, even if the movement is the occasional twitch, a shift of the eye, or a blink. Your objective is to endow your character with the illusion of life. As such, it is wise to follow Shakespeare’s advice, "Hold the mirror up to nature" (Hamlet, III. ii.17-21). Notice that when a person listens, she may tilt her head to the side or glance off in the distance as she contemplates and integrates new information. When she smiles and says nice things to you, her objective is to please. Always know what your character’s objective is because it is the roadmap linking behaviors to their goals. Knowing her personality and history are essential here.
4. Your character moves continuously from action to action. Your character is doing something 100 percent of the time. There must always be life! Even if she appears to be waiting, things are going on mentally. Make a list of boredom behaviors and use them. When people talk, a good emotion extraction engine will feed her cues on how to react to what’s being said. Her actions expressing emotional responses are fluid. They flow into each other forming a face story. You should be able to tell from the character’s expression how she’s reacting to what you’re saying. Say she takes a deep breath and you see the cords on her neck tighten. They then relax. Her body slumps a bit and perhaps she nods. Always in motion, she maintains the illusion of life.
5. All action begins with movement. You can’t even do math without your face moving, exposing wheels spinning beneath. Your eyes twitch. You glance at the ceiling, pondering. Your brow furrows as you struggle with the solution. Try this experiment: Ask a friend to lie as still as possible on the floor. No movement at all. Then, when he is absolutely stone still, ask him to multiply 36 by 38. Pay close attention to his eyes. You will note that they immediately begin to shift and move. It is impossible to carry out a mental calculation without the eyes moving. Sometimes movement on the screen needs to be a bit more overt than in real life. That’s okay, even essential. It nails down the emotion. Done right, people won’t notice the exaggeration, but will get the point.
6. Empathy is audience glue. The main transaction between humans and Virtual humans has to be emotion, not words. Words alone will lose them. You will catch a viewer’s attention if your character appears to be thinking, but you will engage your viewer emotionally if your character appears to be feeling. You must get across how this V-person feels about what’s going on. If you do it successfully, the audience will care about (empathize with) those feelings. I promise you it can be done. A great autonomous character can addict an audience in ways a static animation cannot. The transaction between audience and character is in real-time and directly motivated, much as it is on stage. This is a unique acting medium, which is part live performance and part animation. It’s an opportunity for you to push things–experiment with building empathy pathways.
7. Interaction requires negotiation. You want a little theatrical heat in any discourse with a V-person. To accomplish this, remember that your character always has choices. We all do, in every waking moment. The character has to decide when and whether to answer or initiate a topic. If your character is simply mouthing words, your audience response will be boredom. Whether they know it or not, people want to be entertained by your character. Artonin Artaud famously observed that "actors are athletes of the heart." Dead talk is not entertaining. There must be emotion. Recognize that you’re working with a theatrical situation and that the viewer will crave more than a static picture.
Sure, there are loads more acting concepts we could talk about, but these seven are the hard-rock core of it. You’re faced with a unique acting challenge because you have an animated character that is essentially alive. If that character is a cartoon or anime design and personality, you’ll have to read Preston Blair , for example, to learn the principles of exaggerated cartoon acting, and then incorporate these squash and stretch type actions into your character’s personality. If you take the easier road and use a photorealistic human actor, you still must make their actions a bit larger than life, but not as magnified as cartoons demand.
The stage you set will depend on the Virtual actor’s intention. If he’s there to guide a person around a no-nonsense corporate Web site, you’ll need to think hard about how much entertainment to inject. Certainly you need some. Intelligent Virtual actors in games situations–especially full-bodied ones–present marvelous opportunities to expand this new field of acting. You’ll know their intentions. Let them lead you to design their actions. Embellish their personalities, embroider their souls, and decorate their actions. Making them bigger than life will generally satisfy.
Synthespians: The Early Years
Next I want to tell you about the clever term "Synthespian," which unfortunately I didn’t coin. I do believe it should become a part of our language.
Diana Walczak and Jeff Kleiser produced some early experimental films featuring excellent solo performances by digital human characters. For example, Nestor Sextone for President premiered at SIGGRAPH in 1988. About a year later, Kleiser and Walczak presented the female Synthespian, Dozo, in a music video: "Don’t Touch Me." These were not intelligent agents, but they were good actors. "It was while we were writing Nestor’s speech to an assembled group of "synthetic thespians" that we coined the term "Synthespian," explains Jeff Kleiser. Nestor Sextone had to be animated from digitized models sculpted by Diana Walczak.
As history will note, the field of digital animation is a close, almost incestuous one. Larry Weinberg, the fellow who later created Poser, worked out some neat software that allowed Jeff and Diana to link together digitized facial expressions created from multiple maquettes she’d sculpted to define visemes. That same software allowed them to animate Nestor’s emotional expression. I’ve put a copy of this wonderful classic bit of animation on the CD-ROM, with their blessing.
Note that this viseme-linking was an early part of the development chain leading to the morph targets you see in Poser and all the high-end animation suites today. Getting your digitized character to act was difficult in those days before bones, articulated joints, and morphing skin made movement realistic. Nestor was made up of interpenetrating parts that had to be cleverly animated to look like a gestalt character without any obvious cracks or breaks or parts sticking out.
In most cases, V-people don’t have a full body to work with, just a face, and perhaps hands. Body language is such an effective communications tool, but when we just don’t have it we end up putting twice as much effort into face and upper body acting. Fortunately a properly animated face can be wonderfully expressive, as shown in Figure 13-1.
Figure 13-1: Virtual actors can really show emotion
Synthespians All Have a Purpose
A Synthespian playing a living person is probably the trickiest circumstance you’ll encounter. Depending on the situation, you want to emulate that person’s real personality closely, or exaggerate it for comedic impact or political statement. If you exaggerate features and behavior heavily you’ve entered a new art form: interactive caricature or parody.
Let’s say we’ve built a synthetic Secretary of Defense Donald Rumsfeld. The interactive theatrical situation is that we are interrupting him while he is hectically planning an attack somewhere in the world. He might be impatient and have an attitude regarding our utter stupidity and lack of patriotism for bothering him at a time like this. His listening skills might be shallow. He might continually give off the dynamic that he has better things to do. By thus exaggerating his personality, we create interest and humor. As a user, you want to interact because you feel something interesting is happening. There is comic relief, and all the while this character is making a political statement. I suspect Rumsfeld would get a kick out of such a representation, as long as it’s done in good taste.
Action conveys personality, and you can’t set up a virtual actor without knowing the character well. For example, Kermit the Frog has a definite psychology behind him. As a Web host, he is just very happy to be there. He enjoys being in the spotlight, and his behavior strongly implies he doesn’t want to be any place else. He’s happy to show you around his Web site, and he might even break out in song along the way. Occasionally he’ll complain about Miss Piggy’s lack of attention or the disadvantages of his verdant complexion.
Think first about your intention and then the character’s intention. Mae West and Will Rogers wanted to make ‘em laugh. No matter what your purpose for a Synthespian, you want it to entertain. Sometimes it may be understated. Remember that cleverness is always in style. Notice the look people get on their faces when they think they’re being clever. It’s usually an understated cockiness that shows around the eyes. The intention is to be clever, the words are smart, but remember to add that subtle touch of smugness or self-satisfaction around the eyes and the corners of the mouth.
Note: There is a new book titled Emotions Revealed: Recognizing Faces and Feelings to Improve Communication and Emotional Life, by Paul Ekman (Times Books, 2003), which is well worth your time to read. Ekman, who is professor of psychology in the department of psychiatry at the University of California Medical School, San Francisco, is one of the world’s great geniuses on the subject of the expression of emotion in the human face. His new book has more than one hundred photographs of nuanced facial expression, complete with explanations for the variances.
As an aside, I used to train counter-terrorist agents in psychological survival. One way to spot a terrorist in a crowd is that they often have facial expressions that are inappropriate to the situation. I used Ekman’s work as a reference to help my agents recognize when facial expression and body language don’t match up, an indication often exhibited by potential terrorists. You can use Ekman’s work to make sure your V-human agents have appropriate expressions for the situation.
You Are the Character
When you’ve done your homework, you’ll know your character like you know yourself. You’ll identify with the character so intensely you will have the sensation of being that character. Stage actors learn to create characters by shifting from the third person to the first person reference. Instead of saying, "My character would be afraid in this situation," a stage actor might say, while portraying the character, "I feel afraid." In your case, you are creating a second-party character, but you’re empathizing personality with the emotions of your own creation. There is an identity between the two of you that will be both fun and compelling.
Designing animation elements for the character requires feeling them. I remember watching my daughter as she animated a baby dragon early in her career. Her natural instinct was to get inside that baby dragon and be it. I smiled as I watched her body and face contort as she acted out each part of the sequence. Her instruction had not come from me…it was intuitive. At Disney, I’ve watched animators making faces in little round mirrors dangling from extension arms above their desks. They glance in the mirror, make a face and then look at the cel and try to capture what they’ve seen. That part hasn’t changed. For us it’s glance at the mirror, glance at the screen, and then tweak a spline or morph setting. You won’t be able to do all this with the simple animation tools I’ve given you for free. Those are just to get you hooked. If you intend to learn this stuff, get ready to invest heavily in time and commitment and a fair amount in coin as well. A small investment considering the return.
If You Want to Go Further
There are great animation schools, and this continent has some of the best. My favorite is at Sheridan College in Oakville, Ontario. But there are many good schools here in the United States as well. A few years ago, most of them were a waste of money. But things have improved. Do some Web research and find which school can best help you meet your goals. There is a long-term need for talented, well-trained character animators, and in general the pay for the talented is phenomenal.
If you’re a developer, you have to be familiar with all this stuff to manage it effectively. You’re responsible for the final product. If you have animators working for you, believe in them, give them freedom, but guide them toward your vision as well. The best animated characters reflect the wisdom, vision, and artistry of their prime artists and the producers behind them. A great producer is an artist, a business person, and a technician. It’s not easy to get there, and too may producers only have the business end down. As a producer, you have to understand the artistry of production. You have to feel the emotion of good animation. How else will you know what to approve and not approve. So learn it and you’ll be way above the crowd.
I want to thank Ed Hooks for contributing his wisdom to this chapter. Remember, what you’ve read here is just a taste of what you need to learn. If you’re lucky, you’ll find a way to take a live class with Ed, who now lives in the Chicago area. It will change your perspective forever.
In the chapter upcoming, I’m going to kick it up a notch with ways to give your character true awareness of his surroundings. Imagine your well-developed character, now able not only to listen and talk, but actually to see you, look you in the eyes, and recognize you without asking. You don’t want to miss this one.
Ed Hooks, author of Acting for Animators (Heinemann, Revised Second Edition 2003), has been a theatre professional for three decades and has taught acting to both animators and actors for PDI, Lucas Learning, Microsoft, Disney Animation, and other leading companies.
© 2004 Peter Plantec
Virtual Humans:a Build-It-Yourself Kit, Complete with Software and Step-by Step Instructions by Peter Plantec, with foreword by Ray Kurzweil, will be published by AMACOM on Nov. 16, 2003 for $39.95.
Software included on CD with book:
- Yapanda proprietary virtual-human engine for builiding animated 3D characters
- J-ALICE AIML chatbot engine for authoring intelligent dialogues and PalmAlice for PalmOS
- Program-N ALICE chatbot
- Library of 3D talking-head characters
- Poser and 3dMeNow software demos
- Elliza, WinAlice, Tic-Tac-Toe, and Jackass AI programs
- Video and audio examples