3-D cameras for cellphones
January 6, 2012
Imagine a high-quality 3-D camera that provides more-accurate depth information than the Microsoft Kinect, has a greater range, and works under all lighting conditions — but is so small, cheap and power-efficient that it could be incorporated into a cellphone at very little extra cost.
That’s the promise of recent work by Vivek Goyal, the Esther and Harold E. Edgerton Associate Professor of Electrical Engineering, and his group at MIT’s Research Lab of Electronics.
When Microsoft’s Kinect — a device that lets Xbox users control games with physical gestures — hit the market, computer scientists immediately began hacking it. A black plastic bar about 11 inches wide with an infrared rangefinder and a camera built in, the Kinect produces a visual map of the scene before it, with information about the distance to individual objects. At MIT alone, researchers have used the Kinect to create a “Minority Report”-style computer interface, a navigation system for miniature robotic helicopters and a holographic-video transmitter, among other things.
“3-D acquisition has become a really hot topic,” Goyal says. “In consumer electronics, people are very interested in 3-D for immersive communication, but then they’re also interested in it for human-computer interaction.” Gestural interfaces make it much easier for multiple people to interact with a computer at once — as in the dance games the Kinect has popularized.
The system uses a pulse of infrared laser light fired at a scene; the camera then measures the time it takes the light to return from objects at different distances.
Traditional time-of-flight systems use one of two approaches to build up a “depth map” of a scene. LIDAR (for light detection and ranging) uses a scanning laser beam that fires a series of pulses, each corresponding to a point in a grid, and separately measures their time of return. But that makes data acquisition slower, and it requires a mechanical system to continually redirect the laser. The alternative, employed by so-called time-of-flight cameras, is to illuminate the whole scene with laser pulses and use a bank of sensors to register the returned light. But sensors able to distinguish small groups of light particles — photons — are expensive: A typical time-of-flight camera costs thousands of dollars.
Unlike traditional time-of-flight systems, the MIT researchers’ system uses only a single light detector — a one-pixel camera. But by using some clever mathematical tricks, it can get away with firing the laser a limited number of times.
To add the crucial third dimension to the depth map, the researchers use another technique, called parametric signal processing. Essentially, they assume that all of the surfaces in the scene, however they’re oriented toward the camera, are flat planes. Although that’s not strictly true, the mathematics of light bouncing off flat planes is much simpler than that of light bouncing off curved surfaces. The researchers’ parametric algorithm fits the information about returning light to the flat-plane model that best fits it, creating a very accurate depth map from a minimum of visual information.
The algorithm lets the researchers get away with relatively crude hardware: a cheap photodetector and an ordinary analog-to-digital converter — an off-the-shelf component already found in all cellphones. The sensor takes about 0.7 nanoseconds to register a change to its input.
The researchers’ algorithm is also simple enough to run on the type of processor ordinarily found in a smartphone. To interpret the data provided by the Kinect, by contrast, the Xbox requires the extra processing power of a graphics-processing unit (GPU).
Ref.: (paper be be presented at the IEEE’s International Conference on Acoustics, Speech, and Signal Processing in March).
Ref.: Ahmed Kirmani et al., Exploiting sparsity in time-of-flight range acquisition using a single time-resolved sensor, Optics Express, 2011 [DOI: 10.1364/OE.19.021485]