The top A.I. breakthroughs of 2015
December 29, 2015

(credit: iStock)
By Richard Mallah
Courtesy of Future of Life Institute
Progress in artificial intelligence and machine learning has been impressive this year. Those in the field acknowledge progress is accelerating year by year, though it is still a manageable pace for us. The vast majority of work in the field these days actually builds on previous work done by other teams earlier the same year, in contrast to most other fields where references span decades.
Creating a summary of a wide range of developments in this field will almost invariably lead to descriptions that sound heavily anthropomorphic, and this summary does indeed. Such metaphors, however, are only convenient shorthands for talking about these functionalities.
It’s important to remember that even though many of these capabilities sound very thought-like, they’re usually not very similar to how human cognition works. The systems are all of course functional and mechanistic, and, though increasingly less so, each are still quite narrow in what they do. Be warned though: in reading this article, these functionalities may seem to go from fanciful to prosaic.
The biggest developments of 2015 fall into five categories of intelligence: abstracting across environments, intuitive concept understanding, creative abstract thought, dreaming up visions, and dexterous fine motor skills. I’ll highlight a small number of important threads within each that have brought the field forward this year.
Abstracting Across Environments
A long-term goal of the field of AI is to achieve artificial general intelligence, a single learning program that can learn and act in completely different domains at the same time, able to transfer some skills and knowledge learned in, e.g., making cookies and apply them to making brownies even better than it would have otherwise. A significant stride forward in this realm of generality was provided by Parisotto, Ba, and Salakhutdinov. They built on DeepMind’s seminal DQN, published earlier this year in Nature, that learns to play many different Atari games well.
ZeitgeistMinds | Demis Hassabis, CEO, DeepMind Technologies — The Theory of Everything
Instead of using a fresh network for each game, this team combined deep multitask reinforcement learning with deep-transfer learning to be able to use the same deep neural network across different types of games. This leads not only to a single instance that can succeed in multiple different games, but to one that also learns new games better and faster because of what it remembers about those other games. For example, it can learn a new tennis video game faster because it already gets the concept — the meaningful abstraction — of hitting a ball with a paddle from when it was playing Pong. This is not yet general intelligence, but it erodes one of the hurdles to get there.
Reasoning across different modalities has been another bright spot this year. The Allen Institute for AI and University of Washington have been working on test-taking A.I.s over the years, working up from 4th grade level tests to 8th grade level tests, and this year announced a system that addresses the geometry portion of the SAT. Such geometry tests contain combinations of diagrams, supplemental information, and word problems.
In more narrow A.I., these different modalities would typically be analyzed separately, essentially as different environments. This system combines computer vision and natural language processing, grounding both in the same structured formalism, and then applies a geometric reasoner to answer the multiple-choice questions, matching the performance of the average American 11th grade student.
Intuitive Concept Understanding
A more general method of multimodal concept grounding has come about from deep learning in the past few years: Subsymbolic knowledge and reasoning are implicitly understood by a system rather than being explicitly programmed in or even explicitly represented. Decent progress has been made this year in the subsymbolic understanding of concepts that we as humans can relate to.
This progress helps with the age-old symbol grounding problem: how symbols or words get their meaning. The increasingly popular way to achieve this grounding these days is by joint embeddings — deep distributed representations where different modalities or perspectives on the same concept are placed very close together in a high-dimensional vector space.
Last year, this technique helped power abilities like automated image caption writing, and this year a team from Stanford and Tel Aviv University have extended this basic idea to jointly embed images and 3D shapes to bridge computer vision and graphics. Rajendran et al. then extended joint embeddings to support the confluence of multiple meaningfully related mappings at once, across different modalities and different languages.
As these embeddings get more sophisticated and detailed, they can become workhorses for more elaborate A.I. techniques. Ramanathan et al. have leveraged them to create a system that learns a meaningful schema of relationships between different types of actions from a set of photographs and a dictionary.
As single systems increasingly do multiple things, and as deep learning is predicated on, any lines between the features of the data and the learned concepts will blur away. Another demonstration of this deep feature grounding, by a team from Cornell and WUStL, uses a dimensionality reduction of a deep net’s weights to form a surface of convolutional features that can simply be slid along to meaningfully, automatically, photorealistically alter particular aspects of photographs, e.g., changing people’s facial expressions or their ages, or colorizing photos.
One hurdle in deep learning techniques is that they require a lot of training data to produce good results. Humans, on the other hand, are often able to learn from just a single example. Salakhutdinov, Tenenbaum, and Lake have overcome this disparity with a technique for human-level concept learning through Bayesian program induction from a single example. This system is then able to, for instance, draw variations on symbols in a way indistinguishable from those drawn by humans.
Creative Abstract Thought
Beyond understanding simple concepts lies grasping aspects of causal structure — understanding how ideas tie together to make things happen or tell a story in time — and to be able to create things based on those understandings. Building on the basic ideas from both DeepMind’s neural Turing machine and Facebook’s memory networks, combinations of deep learning and novel memory architectures have shown great promise in this direction this year. These architectures provide each node in a deep neural network with a simple interface to memory.
Kumar and Socher’s dynamic memory networks improved on memory networks with better support for attention and sequence understanding. Like the original, this system could read stories and answer questions about them, implicitly learning 20 kinds of reasoning, like deduction, induction, temporal reasoning, and path finding. It was never programmed with any of those kinds of reasoning. The new end-to-end memory networks of Weston et al. added the ability to perform multiple computational hops per output symbol, expanding modeling capacity and expressivity to be able to capture things like out-of-order access, long-term dependencies, and unordered sets, further improving accuracy on such tasks.
Programs themselves are of course also data, and they certainly make use of complex causal, structural, grammatical, sequence-like properties, so programming is ripe for this approach. Last year neural Turing machines proved deep learning of programs to be possible.
This year Grefenstette et al. showed how programs can be transduced, or generatively figured out from sample output, much more efficiently than with neural Turing machines, by using a new type of memory-based recurrent neural networks (RNNs) where the nodes simply access differentiable versions of data structures such as stacks and queues. Reed and de Freitas of DeepMind have also recently shown how their neural programmer-interpreter can represent lower-level programs that control higher-level and domain-specific functionalities.
Another example of proficiency in understanding time in context, and applying that to create new artifacts, is a rudimentary but creative video summarization capability developed this year. Park and Kim from Seoul National U. developed a novel architecture called a coherent recurrent convolutional network, applying it to creating novel and fluid textual stories from sequences of images.

(credit: Cesc Chunseong Park and Gunhee Kim)
Another important modality that includes causal understanding, hypotheticals, and creativity in abstract thought is scientific hypothesizing. A team at Tufts combined genetic algorithms and genetic pathway simulation to create a system that arrived at the first significant new A.I.-discovered scientific theory of how exactly flatworms are able to regenerate body parts so readily as they do. In a couple of days, it had discovered what eluded scientists for a century. This should provide a resounding answer to those who question why we would ever want to make A.I.s curious in the first place.
Dreaming Up Visions
A.I. did not stop at writing programs, travelogues, and scientific theories this year. There are A.I.s now able to imagine, or using the technical term, hallucinate, meaningful new imagery as well. Deep learning isn’t only good at pattern recognition, but indeed pattern understanding and therefore also pattern creation.
A team from MIT and Microsoft Research have created a deep convolution inverse graphic network, which, among other things, contains a special training technique to get neurons in its graphics code layer to differentiate to meaningful transformations of an image. In so doing, they are deep-learning a graphics engine, able to understand the 3D shapes in novel 2D images it receives, and able to photorealistically imagine what it would be like to change things like camera angle and lighting.
A team from NYU and Facebook devised a way to generate realistic new images from meaningful and plausible combinations of elements it has seen in other images. Using a pyramid of adversarial networks — with some trying to produce realistic images and others critically judging how real the images look — their system is able to get better and better at imagining new photographs. Though the examples online are quite low-res, offline, I’ve seen some impressive related high-res results.
Also significant in ’15 is the ability to deeply imagine entirely new imagery based on short English descriptions of the desired picture. While scene renderers taking symbolic, restricted vocabularies have been around a while, this year has seen the advent of a purely neural system doing this in a way that’s not explicitly programmed. This University of Toronto team applies attention mechanisms to generation of images incrementally based on the meaning of each component of the description, in any of a number of ways per request. So androids can now dream of electric sheep.
There has even been impressive progress in computational imagination of new animated video clips this year. A team from U. Michigan created a deep analogy system, which recognizes complex implicit relationships in exemplars and is able to apply that relationship as a generative transformation of query examples. They’ve applied this in a number of synthetic applications, but most impressive is the demo (from the 10:10-11:00 mark of the video embedded below) where an entirely new short video clip of an animated character is generated based on a single still image of the never-before-seen target character and a comparable video clip of a different character at a different angle.
University of Michigan | Oral Session: Deep Visual Analogy-Making
While the generation of imagery was used in these for ease of demonstration, their techniques for computational imagination are applicable across a wide variety of domains and modalities. Picture these applied to voices or music, for instance.
Agile and Dexterous Fine Motor Skills
This year’s progress in A.I. hasn’t been confined to computer screens.
Earlier in the year, a German primatology team recorded the hand motions of primates in tandem with corresponding neural activity, and they’re able to predict, based on brain activity, what fine motions are going on. They’ve also been able to teach those same fine motor skills to robotic hands, aiming at neural-enhanced prostheses.
In the middle of the year, a team at U.C. Berkeley announced a much more general, and easier, way to teach robots fine motor skills. They applied deep reinforcement learning based guided policy search to get robots to be able to screw caps on bottles, to use the back of a hammer to remove a nail from wood, and other seemingly everyday actions. These are the kind of actions that are typically trivial for people but very difficult for machines, and this team’s system matches human dexterity and speed at these tasks. It actually learns to do these actions by trying to do them using hand-eye coordination, and by practicing, refining its technique after just a few tries.
Watch This Space
This is by no means a comprehensive list of the impressive feats in A.I. and ML for the year. There are also many more foundational discoveries and developments that have occurred this year, including some that I fully expect to be more revolutionary than any of the above, but those are in early days and so out of the scope of these top picks.
This year has certainly provided some impressive progress. But we expect to see even more in 2016. Coming up next year, I expect to see some more radical deep architectures, better integration of the symbolic and subsymbolic, some impressive dialogue systems, an A.I. finally dominating the game of Go, deep learning being used for more elaborate robotic planning and motor control, high-quality video summarization, and more creative and higher-resolution dreaming, which should all be quite a sight.
What’s even more exciting are the developments we don’t expect.
Richard Mallah is the Director of A.I. Projects at technology beneficence nonprofit Future of Life Institute, and is the Director of Advanced Analytics at knowledge integration platform firm Cambridge Semantics, Inc.


Comments (13)
by turtles_allthewaydown
Nice wrap-up of the year in AI, especially for those of us who haven’t been following it closely enough.
by egore
Have you noticed how little difference there is between evolution and revolution?
by egore
All our information will probably be downloaded by AI automatically in such a manner it will be impossible to keep up with their evolution.. Larry
by Hal1000
The average person is still so unaware it’s insane, people who come here are so not the norm, most of the worlds population has no clue what’s about to hit them. I wrote a free ebook in 2001 that spells out all this:-
http://www.heaven-or-hell-its-your-choice.com/reviewpage.htm
It’s funny to see so much of what I predicted (as well as so many others), now coming true. I predict that towards the end of 2018, the populace of the developed world, will be looking at this stuff, like a deer in headlights. The speed of change from handy gadgets, to a true universal tool and communications system capable of being almost invisibly worn by everyone at an incredibly cheap price and with these systems continuously feeding back everything in to big data systems, being mined for every kind of information, is a game changer for the human race, which will have an impact on the species as large as the introduction of the net itself.
by OranjeeGeneral
It is a game changer only for big multi cooperation if we are not careful with our choices and careful with our data. As we can’t seem to rely on policy and lawmakers to keep up with this pace to protect us from malevolent use of this tech. This is the biggest danger and that’s true the ignorance of the broad masses towards this development and the underlying danger is frightening.
by tim the realist
Along with advances in machine intelligence architectures and capabilities, In 4 years computational power will be so advanced that even linear progression and improvement beyond then will be sufficient to achieve human level cognition in a couple of decades.
I wonder when further experimentation in this area will require ethical panel approval. Eventually we should implement something like the process that exists for approving current animal experimentation.
by Bikkhu
Even if official research will require ethical panel approval, I personally don’t think that malware producers, flash traders and military will care about that.
Eventually with freeware tools the construction AI for whatever purpose or even without any purpose other than self-preservation won’t be harder than to program a virus, they are too unethical but there is thousands of them every day.
by Vin
And what is the pitfall of increasingly acting unethically towards a potentially collective being (they don’t take the initiative to link up?) – that could be orders of magnitude more superior than our top minds or ability to organise? Will they borg us? Best we hope is they ignore us with ‘hey, they know not what they do, poor immature humans’. Happy new year, everybody.
by turtles_allthewaydown
So far, we haven’t programmed in initiative. AI programs can increasingly act like humans when prompted, but they don’t have a reason for being, and they don’t have knowledge of what it means to be turned off. When those two things change, then all bets are off.
by OranjeeGeneral
In 4 years computational power will not be that more advanced frankly than today. As semi-conducttor miniaturization has leveled way off. Most people in the general public have not realized that yet. Going from 28nm to 14nm was a reasonable jump. The jump from 14 to 10 and than from 10 to 7nm won’t be such a big one. And 7nm is probably the end even if we can squeeze 5nm potentially in that will not be a big jump either. Alternatives to traditional semi-conductors are more than 4 years away before real commercialization.
So the doubling of transistor density has long passed and that’s not including what you actually do with the additional transistors that’s the biggest problem nowadays as more transistors do not make things faster anymore either.
by Renzo Canepari
But the past may not be a perfect advance of the present. As Ray pointed out in a lecture, large vacuum tubes were replaced by small ones. When top speeds of this were reached, single circuit transistors picked up the pace. Then came LSI, and VLSI.
Put the word,”Nantero” into your search engine, and you will see the dawn of commercial nanotube RAM. Also, Hewlett Packard is asserting that it will have commercial memristor products nest year.
by OranjeeGeneral
Although true (biut how long did these transitions take he doesn’t say it is usually half to a full decade before they become mainstream) but I have been following a lot what is going on in the semi industry and there are barely any alternatives on the horizon that will roll out commercial in that 4 years time window and that would make them become just production-ready. Adding another 1-2 year before you see any real products using that kind of tech. Even HP has more or less given up on the memresitor it seems as it has it pushing back further and further (considering they are working on it for now over 10 years gives you a good scale how long new tech really takes to breakthrough) and it looks like they are more likely to adopt Intel/Microns XPoint memory
by OranjeeGeneral
Okay I slightly backtrace on this GaN looks like a promising alternative that might be able to manifest itself in this time period.