Deep neural network rivals primate brain in object recognition

December 19, 2014

Example images from three of the seven image categories used to measure object category recognition performance by neural networks and monkeys (credit: Cadieu et al./ PLoS Comput Biol)

A new study from MIT neuroscientists has found that for the first time, one of the latest generation of “deep neural networks” matches the ability of the primate brain to recognize objects during a brief glance.

Because these neural networks were designed based on neuroscientists’ current understanding of how the brain performs object recognition, the success of the latest networks suggests that neuroscientists have a fairly accurate grasp of how object recognition works, says James DiCarlo, a professor of neuroscience and head of MIT’s Department of Brain and Cognitive Sciences and the senior author of a paper describing the study in the Dec. 18 issue of the open-access journal PLoS Computational Biology.

Primates visually recognize and determine the category of an object even at a brief glance, and to date, this behavior has been unmatched by artificial systems.

Lead author Charles Cadieu and colleagues from MIT measured the brain’s object recognition ability by implanting arrays of electrodes in the inferior temporal cortex of macaque monkeys and in area V4, a part of the visual system that feeds into the that area of the cortex. This allowed the researchers to see the neural representation — the population of neurons that respond — for every object that the animals looked at.

When comparing these results with representations created by the deep neural networks, the accuracy of the model was determined by whether it grouped similar objects into similar clusters within the representation.

This improved understanding of how the primate brain works could lead to better artificial intelligence and provide insight into understanding primate visual processing.

“The fact that the models predict the neural responses and the distances of objects in neural population space shows that these models encapsulate our current best understanding as to what is going on in this previously mysterious portion of the brain,” say the authors.

More processing power and data

Two major factors account for the recent success of this type of neural network, Cadieu says. One is a significant leap in the availability of computational processing power, using relatively inexpensive graphical processing units (GPUs). The second factor is that researchers now have access to large datasets to feed the algorithms to “train” them. These datasets contain millions of images, and each one is annotated by humans with different levels of identification. For example, a photo of a dog would be labeled as animal, canine, domesticated dog, and the breed of dog.

Cadieu says that researchers don’t know much about what exactly allows these networks to distinguish different objects. “That’s a pro and a con,” he says. “It’s very good in that we don’t have to really know what the things are that distinguish those objects. But the big con is that it’s very hard to inspect those networks, to look inside and see what they really did. Now that people can see that these things are working well, they’ll work more to understand what’s happening inside of them.”

DiCarlo’s lab now plans to try to generate models that can mimic other aspects of visual processing, including tracking motion and recognizing three-dimensional forms. They also hope to create models that include the feedback projections seen in the human visual system. Current networks only model the “feedforward” projections from the retina to the IT cortex, but there are 10 times as many connections that go from IT cortex back to the rest of the system.

This work was supported by the National Eye Institute, the National Science Foundation, and the Defense Advanced Research Projects Agency.

Abstract of Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

The primate visual system achieves remarkable visual object recognition performance even in brief presentations, and under changes to object exemplar, geometric transformations, and background variation (a.k.a. core visual object recognition). This remarkable performance is mediated by the representation formed in inferior temporal (IT) cortex. In parallel, recent advances in machine learning have led to ever higher performing models of object recognition using artificial deep neural networks (DNNs). It remains unclear, however, whether the representational performance of DNNs rivals that of the brain. To accurately produce such a comparison, a major difficulty has been a unifying metric that accounts for experimental limitations, such as the amount of noise, the number of neural recording sites, and the number of trials, and computational limitations, such as the complexity of the decoding classifier and the number of classifier training examples. In this work, we perform a direct comparison that corrects for these experimental limitations and computational considerations. As part of our methodology, we propose an extension of “kernel analysis” that measures the generalization accuracy as a function of representational complexity. Our evaluations show that, unlike previous bio-inspired models, the latest DNNs rival the representational performance of IT cortex on this visual object recognition task. Furthermore, we show that models that perform well on measures of representational performance also perform well on measures of representational similarity to IT, and on measures of predicting individual IT multi-unit responses. Whether these DNNs rely on computational mechanisms similar to the primate visual system is yet to be determined, but, unlike all previous bio-inspired models, that possibility cannot be ruled out merely on representational performance grounds.