Mask-bot: A talking video humanoid robot

November 8, 2011 by Amara D. Angelica

Dr. Takaaki Kuratate in conversation with his Mask-bot self (credit: TU München)

Welcome to the creepiest uncanny-valley experience yet: a talking robot face called Mask-botdeveloped by a team at the Institute for Cognitive Systems (ICS) at TU München and AIST, the National Institute of Advanced Industrial Science and Technology in Japan.

What sets Mask-bot apart is that it can instantly construct and project a static video image of anyone’s face (from a photo) on a 3D surface, and it moves its virtual head a little and raises its eyebrows as you speak, to create the impression that it understands. (It doesn’t. Yet.)

Also, it projects the image from behind, making it more realistic (unlike Disney animatronics characters, for example, which are projected from the front), and works in daylight. It’s also more flexible than existing humanoid robots, which use a complex set of mechanical parts and must be custom-designed.

Avatars for video conferences

Mask-bot could soon be deployed in video conferences, says Dr. Takaaki Kuratate. “You can create a realistic replica of a person that actually sits and speaks with you at the conference table. You can use a generic mask for male and female, or you can provide a custom-made mask for each person.”

But a more advanced version of Mask-bot doesn’t even require a video image of the person speaking. a program can also convert a normal two-dimensional photograph into a correctly proportioned projection for a three-dimensional mask complete with facial expressions and voice. A talking-head animation engine filters an extensive series of face motion data from a variety of people collected by a motion capture system and selects the facial expressions that best match a specific phoneme being spoken. Examples here.

So how long will it be before we get a tech support guy’s talking head yelling “Move!”? Not sure I’m ready for that.

The computer extracts a set of facial coordinates from each of these expressions, which it can then assign to any new face, bringing it to life. Emotion synthesis software then delivers the visible emotional nuances that indicate, for example, when someone is happy, sad or angry.

See the video below of a rather severe Mask-bot that reminds me of my elementary school principal (think: ruler, knuckle). Needs work. Maybe they should hire actor Jim Carrey to help with facial mask expressions? (See video below.)

Synthesized voice

But wait, it gets creepier. An advanced version of Mask-bot can also reproduce content typed via a keyboard . A text-to-speech system converts text in in English, Japanese, and soon German to audio (female or male voice), which can be quiet or loud, happy or sad. Mask-bot doesn’t actually understand anything; it just listens and makes pretend responses as part of a fixed programming sequence.

Hmmm… I wonder what would happen if we hooked up a Siri app to this thing?

Meanwhile, the Munich researchers are working on Mask-bot 2, a mobile version. The mask, projector and computer control system will all be contained inside a robot costing around EUR 400 (Mask-bot 1:  EUR 3,000).

“Mask-bot will influence the way in which we humans communicate with robots in the future,” predicts Prof. Gordon Cheng, head of the ICS team. “These systems could soon be used as companions for older people who spend a lot of time on their own,” says Kuratate. (Get ready to ramble.)

Now that would put a whole lot of cats out of business. That’s so wrong.