3-D cameras for cellphones
January 6, 2012

Depth-sensing cameras can produce 'depth maps' like this one, in which distances are depicted as shades on a gray-scale spectrum (lighter objects are closer, darker ones farther away) (credit: Dominic/Flickr)
Imagine a high-quality 3-D camera that provides more-accurate depth information than the Microsoft Kinect, has a greater range, and works under all lighting conditions — but is so small, cheap and power-efficient that it could be incorporated into a cellphone at very little extra cost.
That’s the promise of recent work by Vivek Goyal, the Esther and Harold E. Edgerton Associate Professor of Electrical Engineering, and his group at MIT’s Research Lab of Electronics.
When Microsoft’s Kinect — a device that lets Xbox users control games with physical gestures — hit the market, computer scientists immediately began hacking it. A black plastic bar about 11 inches wide with an infrared rangefinder and a camera built in, the Kinect produces a visual map of the scene before it, with information about the distance to individual objects. At MIT alone, researchers have used the Kinect to create a “Minority Report”-style computer interface, a navigation system for miniature robotic helicopters and a holographic-video transmitter, among other things.
“3-D acquisition has become a really hot topic,” Goyal says. “In consumer electronics, people are very interested in 3-D for immersive communication, but then they’re also interested in it for human-computer interaction.” Gestural interfaces make it much easier for multiple people to interact with a computer at once — as in the dance games the Kinect has popularized.
The system uses a pulse of infrared laser light fired at a scene; the camera then measures the time it takes the light to return from objects at different distances.
Traditional time-of-flight systems use one of two approaches to build up a “depth map” of a scene. LIDAR (for light detection and ranging) uses a scanning laser beam that fires a series of pulses, each corresponding to a point in a grid, and separately measures their time of return. But that makes data acquisition slower, and it requires a mechanical system to continually redirect the laser. The alternative, employed by so-called time-of-flight cameras, is to illuminate the whole scene with laser pulses and use a bank of sensors to register the returned light. But sensors able to distinguish small groups of light particles — photons — are expensive: A typical time-of-flight camera costs thousands of dollars.
Unlike traditional time-of-flight systems, the MIT researchers’ system uses only a single light detector — a one-pixel camera. But by using some clever mathematical tricks, it can get away with firing the laser a limited number of times.
To add the crucial third dimension to the depth map, the researchers use another technique, called parametric signal processing. Essentially, they assume that all of the surfaces in the scene, however they’re oriented toward the camera, are flat planes. Although that’s not strictly true, the mathematics of light bouncing off flat planes is much simpler than that of light bouncing off curved surfaces. The researchers’ parametric algorithm fits the information about returning light to the flat-plane model that best fits it, creating a very accurate depth map from a minimum of visual information.
The algorithm lets the researchers get away with relatively crude hardware: a cheap photodetector and an ordinary analog-to-digital converter — an off-the-shelf component already found in all cellphones. The sensor takes about 0.7 nanoseconds to register a change to its input.
The researchers’ algorithm is also simple enough to run on the type of processor ordinarily found in a smartphone. To interpret the data provided by the Kinect, by contrast, the Xbox requires the extra processing power of a graphics-processing unit (GPU).
Ref.: (paper be be presented at the IEEE’s International Conference on Acoustics, Speech, and Signal Processing in March).
Ref.: Ahmed Kirmani et al., Exploiting sparsity in time-of-flight range acquisition using a single time-resolved sensor, Optics Express, 2011 [DOI: 10.1364/OE.19.021485]
Comments (1)
by star0
This looks like a really big deal. Some possible uses:
1. Put better-than-Kinect sensor technology in a cellphone..
2. Reduce the cost of sensors for robots (don’t need laser scanners to form point clouds?) This includes driverless cars.
3. Better facial recognition technology. In particular, this could be used to defeat the “photograph” technique for fooling facial authentication (where an impostor shows the device a photograph of the person to be recognized).
4. Pattern recognition, in general, could be drastically improved if depth information could be used. This, in turn, could make it much easier for cellphones to do augmented reality tasks.
5. It could have strong positive implications for 3D telepresence (e.g. keeping an eye on your dog remotely), remote education (e.g. seeing a physics lab experiment in high-res 3D), and robotic surgery (e.g. doctors could get better 3D views of the structures they are operating on). And think about its uses for endoscopic probes; and possibly, microrobots.
6. Think about how it would affect live stream webcams, allowing them to stream events in 3D without expensive hardware.
7. It could enable 3D video chat without any bulky equipment or need for fast processors.
8. And, with 3D-enabled cellphones and 3D TVs (or even holographic TVs), people could record 3D home movies and even post them to youtube.
9. When used in combination with quadcopters, it could greatly simply some of the collision-avoidance routines that are needed to make autonomous drones.