Can you out-think a computer in judging photos?

Deep-learning algorithm can weigh up a neighborhood better than humans.
September 30, 2014

Can you rank these images by their distance to the closest McDonald’s? What about ranking them based on the crime rate in the area? (Answers below.) A new algorithm can outperform humans at predicting which of a series of photos is taken in a higher-crime area, or is closer to a McDonald’s restaurant.

An online demo puts you in the middle of a Google Street View with four directional options and challenges you to navigate to the nearest McDonald’s in the fewest possible steps.

While humans are generally better at this specific task than the algorithm, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) found that a new algorithm consistently outperformed humans at a variation of the task in which users are shown two photos and asked which scene is closer to a McDonald’s.

To create the algorithm, the team trained a computer on a set of 8 million Google images from eight major U.S. cities that were embedded with GPS data on crime rates and McDonald’s locations. They then used deep-learning techniques to help the program teach itself how different qualities of the photos correlate. For example, the algorithm independently discovered that some things you often find near McDonald’s franchises include taxis, police vans, and prisons. (Things you don’t find: cliffs, suspension bridges, and sandbars.)

While the project was mostly intended as proof that computer algorithms are capable of advanced scene understanding, PhD student Aditya Khosla has suggested potential uses ranging from a navigation app that avoids high-crime areas, to a tool that could help McDonald’s determine future franchise locations. Khosla previously helped develop an algorithm that can predict a photo’s popularity.

The researchers presented a paper about the work at the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) this summer.

Answer: distance to McDonald’s: (farthest to closest) A>F>D>E>C>B (closest); crime rate (highest to lowest) B>E>C>F>D>A


Abstract of Looking Beyond the Visible Scene

A common thread that ties together many prior works in scene understanding is their focus on the aspects directly present in a scene such as its categorical classification or the set of objects. In this work, we propose to look beyond the visible elements of a scene; we demonstrate that a scene is not just a collection of objects and their configuration or the labels assigned to its pixels – it is so much more. From a simple observation of a scene, we can tell a lot about the environment surrounding the scene such as the potential establishments near it, the potential crime rate in the area, or even the economic climate. Here, we explore several of these aspects from both the human perception and computer vision perspective. Specifically, we show that it is possible to predict the distance of surrounding establishments such as McDonald’s or hospitals even by using scenes located far from them. We go a step further to show that both humans and computers perform well at navigating the environment based only on visual cues from scenes. Lastly, we show that it is possible to predict the crime rates in an area simply by looking at a scene without any real-time criminal activity. Simply put, here, we illustrate that it is possible to look
beyond the visible scene.