Teaching a robot to anticipate human actions
May 30, 2013 by Amara D. Angelica
Why can’t a robot be like a servant (to paraphrase My Fair Lady)? You know, one who would anticipate your every need — even before you asked?
The folks at the Personal Robotics Lab of Ashutosh Saxena, Cornell assistant professor of computer science have gone and done just that.
When we last (virtually) visited the lab, we learned that the roboticists taught “hallucinating” robots to arrange your room for you, and before that to pick up after you, These are important robot skills if you’re disabled, for example — or you live in a dorm.
Now they’ve taken the next step: they’ve taught a Cornell robot to foresee a human action, the person’s likely trajectory, and what it needs to do to help the person. Let’s say you’re walking toward a refrigerator carrying something with both hands; the robot could open the door for you (as shown in the video below).
Anticipating human actions
Understanding when and where to open the door (or when and how to pour a beer, also in the video) can be difficult for a robot because of the many variables it encounters while assessing the situation. But the Cornell team has created an elegant solution.
For starters, the Cornell robot has access to a Microsoft Kinect 3-D camera and a database of 3D videos. The robot identifies the activities it sees, considers what uses are possible with the objects in the scene, and determines how those uses fit with the activities.
It then generates a set of possible trajectories or continuations into the future — such as eating, drinking, cleaning, putting away — and finally chooses the most probable. As the action continues, the robot constantly updates and refines its predictions.
“We extract the general principles of how people behave,” said Ashutosh Saxena, Cornell professor of computer science and co-author of a new study tied to the research. “Drinking coffee is a big activity, but there are several parts to it.” The robot builds a “vocabulary” of such small parts that it can put together in various ways to recognize a variety of big activities, he explained.
In tests, the robot made correct predictions 82 percent of the time when looking one second into the future, 71 percent correct for three seconds, and 57 percent correct for 10 seconds.
“Even though humans are predictable, they are only predictable part of the time,” Saxena said. “The future would be to figure out how the robot plans its action. Right now we are almost hard-coding the responses, but there should be a way for the robot to learn how to respond.”
This is amazing research, getting us closer to the robot butler in Robot & Frank and countless other science-fiction stories.
The research was supported by the U.S. Army Research Office, the Alfred E. Sloan Foundation and Microsoft.
Saxena will join Cornell graduate student Hema S. Koppula as they present their research at the International Conference of Machine Learning, June 18–21 in Atlanta, and the Robotics: Science and Systems conference, June 24–28 in Berlin, Germany.
- Anticipating Human Activities using Object Affordances for Reactive Robotic Response, Hema S Koppula, Ashutosh Saxena. Robotics: Science and Systems (RSS), 2013. (Open access)
- Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation, Hema S Koppula, Ashutosh Saxena. International Conference on Machine Learning (ICML), 2013. (Open access)