Deep neural networks that identify shapes nearly as well as humans

You’re in your self-driving car, with heavy rain and poor visibility. All of a sudden, a blurred shape appears on the road. What should the car do?
April 29, 2016

Self-driving car vs. pedestrians (credit: Google)

Deep neural networks (DNNs) are capable of learning to identify shapes, so “we’re on the right track in developing machines with a visual system and vocabulary as flexible and versatile as ours,” say KU Leuven researchers.

“For the first time, a dramatic increase in performance has been observed on object and scene categorization tasks, quickly reaching performance levels rivaling humans,” they note in an open-access paper in PLOS Computational Biology.

Categorization accuracy for models created by three DNNs (CaffeNet, VGG-19, and GoggLeNet) for three types of images (color, grayscaled, silhouette). For each type, mean human performance is indicated by a gray horizontal line, with the gray surrounding band depicting 95% confidence intervals. Error bars (vertical black lines) depict 95% confidence intervals. (credit: J. Kubilius et al./PLoS Comput Biol)

The researchers found that when trained for generic object recognition from natural photographs, several different DNNs developed visual representations that relate closely to human perceptual shape judgments, even though they were never explicitly trained for shape processing.

However, “We’re not there just yet,” say the researchers. “Even if machines will at some point be equipped with a visual system as powerful as ours, self-driving cars would still make occasional mistakes —- although, unlike human drivers, they wouldn’t be distracted because they’re tired or busy texting. However, even in those rare instances when self-driving cars would err, their decisions would be at least as reasonable as ours.”


Abstract of Deep Neural Networks as a Computational Model for Human Shape Sensitivity

Theories of object recognition agree that shape is of primordial importance, but there is no consensus about how shape might be represented, and so far attempts to implement a model of shape perception that would work with realistic stimuli have largely failed. Recent studies suggest that state-of-the-art convolutional ‘deep’ neural networks (DNNs) capture important aspects of human object perception. We hypothesized that these successes might be partially related to a human-like representation of object shape. Here we demonstrate that sensitivity for shape features, characteristic to human and primate vision, emerges in DNNs when trained for generic object recognition from natural photographs. We show that these models explain human shape judgments for several benchmark behavioral and neural stimulus sets on which earlier models mostly failed. In particular, although never explicitly trained for such stimuli, DNNs develop acute sensitivity to minute variations in shape and to non-accidental properties that have long been implicated to form the basis for object recognition. Even more strikingly, when tested with a challenging stimulus set in which shape and category membership are dissociated, the most complex model architectures capture human shape sensitivity as well as some aspects of the category structure that emerges from human judgments. As a whole, these results indicate that convolutional neural networks not only learn physically correct representations of object categories but also develop perceptually accurate representational spaces of shapes. An even more complete model of human object representations might be in sight by training deep architectures for multiple tasks, which is so characteristic in human development.