Deep Learning RNNaissance with Juergen Schmidhuber, PhD

December 10, 2014

Juergen Schmidhuber, PhD | Machine learning and pattern recognition are currently being revolutionised by “deep learning” (DL) Neural Networks (NNs). This is of commercial interest (for example, Google spent over 400m on start-up Deep Mind,co-founded by our student). I summarise work on DL since the 1960s, and our own work since 1991. Our recurrent NNs (RNNs) were the first to win official international competitions in pattern recognition and machine learning; our team has won more such contests than any other research group or company. Our Long Short-Term Memory (LSTM) RNNs helped to improve connected handwriting recognition, speech recognition, machine translation, optical character recognition, image caption generation, and other fields. Our Deep Learners also were the first to win object detection and image segmentation contests, and achieved the world’s first superhuman visual classification results. We also built the first reinforcement learning RNN-based agent that learns from scratch complex video game control based on high-dimensional vision. Time permitting, I’ll also address curious/creative machines and theoretically optimal, universal, self-modifying artificial intelligences.

related reading:
Wikipedia | Juergen Schmidhuber, PhD

excerpt | Jürgen Schmidhuber, PhD, is a computer scientist and artist known for his work on machine learning, Artificial Intelligence (AI), artificial neural networks, digital physics, and low-complexity art. His contributions also include generalizations of Kolmogorov complexity and the Speed Prior. From 2004 to 2009 he was professor of Cognitive Robotics at the Tech. University Munich. Since 1995 he has been co-director of the Swiss AI Lab IDSIA in Lugano, since 2009 also professor of Artificial Intelligence at the University of Lugano. Between 2009 and 2012, the recurrent neural networks and deep feedforward neural networks developed in his research group have won eight international competitions in pattern recognition and machine learning.[1] In honor of his achievements he was elected to the European Academy of Sciences and Arts in 2008.


Recently, professor Juergen Schmidhuber gave a dozen talks on “Deep Learning” in New York and the Bay Area, for the ML meetup in the Empire State Building, Yahoo, SciHampton, IBM Watson, Google Palo Alto, SciFoo at the Googleplex, Stanford University, ML meetup San Francisco, ICSI, Berkeley University. Similar material was used for invited plenary talks / keynotes at KAIST 2014 (South Korea), ICONIP 2014 (Malaysia), INNS-CIIS 2014 (Brunei). Links to videos are listed below.

Typical title:  Deep Learning RNNaissance (abstract above)

Talk slides: http://www.idsia.ch/~juergen/deeplearning2014slides.pdf

Outline of slides:

– First Deep Learning (DL) (Ivakhnenko, 1965)

– History of backpropagation: Bryson, Kelley, Dreyfus (early 1960s), Linnainmaa (1970), Speelpenning (1980), Werbos (1981), Rumelhart et al (1986), others

– Recurrent neural networks (RNNs) – the deepest of all NNs – search in general program space!

– 1991: Fundamental DL problem (FDLP) of gradient-based NNs (Hochreiter, my 1st student, now prof)

– 1991: Our deep unsupervised stack of recurrent NNs (RNNs) overcomes the FDLP: the Neural History Compressor or Hierarchical Temporal Memory / related to autoencoder stacks (Ballard, 1987) and Deep Belief Nets (Hinton et al, 2006)

– Our purely supervised deep Long Short-Term Memory (LSTM) RNN overcomes the FDLP without any unsupervised pre-training (1990s, 2001, 2003, 2006-, with Hochreiter, Gers, Graves, Fernandez, Wierstra, Gomez, others)

– How LSTM became the first RNN to win controlled contests (2009), and set standards in connected handwriting and speech recognition

– Industrial breakthroughs of 2014: Google / Microsoft / IBM used LSTM to improve machine translation, image caption generation, speech recognition / text-to-speech synthesis / prosody detection

– 2010: How our deep GPU-based NNs trained by backprop (3-5 decades old) + training pattern deformations (2 decades old) broke the MNIST record

– History of feedforward max-pooling (MP) convolutional NNs (MPCNNs, Fukushima 1979-, Weng 1992, LeCun et al 1989-2000s, others)

– How our ensembles of GPU-based MPCNNs (Ciresan et al, 2011) became the first DL systems to achieve superhuman visual pattern recognition (traffic signs), and to win contests in image segmentation (brain images, 2012) and visual object detection (cancer cells, 2012, 2013) / fast MPCNN image scans (Masci et al, 2013)

– Why it’s all about data compression

– 2014: 20 year anniversary of self-driving cars in highway traffic (Dickmanns, 1994)

– Reinforcement Learning (RL): How NN-based planning robots won the RoboCup in the fast league (Foerster et al, 2004)

– Our deep RL through Compressed NN Search applied to huge RNN video game controllers that learn to process raw video input (Koutnik et al, 2013)

– Formal theory of fun and creativity

Selected videos of those talks, with variations due to questions from the audience:

1. New York City Machine Learning Meetup hosted by ShutterStock in the Empire State Building:

https://www.youtube.com/watch?v=6bOMf9zr7N8

Also at Vimeo:

http://vimeo.com/113402131

2. ICSI, Berkeley:

https://www.youtube.com/watch?v=h4FqFss9hEY

3. Bay Area ML Meetup hosted by upsight.com in downtown San Francisco:

https://vimeo.com/105972440

4. Google, Palo Alto (only voice and slides):

http://youtu.be/obGrn1oVJsY

Numerous earlier videos:

http://www.idsia.ch/~juergen/videos.html

Details in the invited Deep Learning Survey (88 pages, 888 references):

http://www.idsia.ch/~juergen/deep-learning-overview.html

http://arxiv.org/abs/1404.7828

Published online by Neural Networks (2014):

http://authors.elsevier.com/a/1Q3Bc3BBjKFZVN

Hardcopy to appear in Vol. 61, p. 85–117, January 2015