Hello everyone. I’m so happy for this opportunity to share the valuable things I learned from professor Yann LeCun’s lecture at the University of Edinburgh on January 13, 2017. Unfortunately, I did not understand everything mentioned in the lecture, but I will try my best to share what I can.
The rapid progress of AI in the last few years is largely the result of advances in deep learning and neural nets, combined with the availability of large datasets and fast GPUs. We now have systems that can recognise images with an accuracy that rival humans. This is creating a revolution in several domains, such as information access, autonomous transportation, and medical image analysis. But currently, all these systems use supervised learning, where the machine is trained with inputs labelled by humans. Therefore, the challenge now is for machines learn from raw, unlabeled data such as video or text. This is known as predictive (or unsupervised) learning.
Intelligent systems today do not possess “common sense”, which in human and animals is acquired by observing the world, understanding the physical constraints, and acting on it. Professor LeCun argues that the ability for machines to learn predictive models of the world is a key component in enabling significant progress in AI. The main technical difficulty is that the world is only partially predictable. A general formulation of unsupervised learning that deals with partial predictability will be presented. The formulation connects many well-known approaches to unsupervised learning, as well as new and exciting ones such as adversarial training.
Yann LeCun Bio
Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Dara Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department.
He received his Electrical Engineer Diploma from ESIEE, Paris in 1983, and a PhD in Computer Science from Université P&M Curie in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories (Holmdel, NJ) in 1988, later becoming the head of the Image Processing Research Department at AT&T Labs-Research in 1996. He joined NYU as a professor in 2003, following a brief period at the NEC Research Institute (Princeton). In 2012. He became the founding director of the NYU Center for Data Science. In late 2013, he was named Director of AI Research at Facebook, remaining on the NYU faculty part-time. He held a visiting professor chair at Collège de France in 2015-2016.
His current interests include AI, machine learning, computer perception, robotics, and computational neuroscience. He is best known for his contributions to deep learning and neural networks, particularly the convolutional network model which is widely used in computer vision and speech recognition applications today. He has published over 190 papers on these topics as well as on handwriting recognition, image compression, and dedicated hardware for AI.
LeCun is a founder and general co-chair of ICLR, and has served on several editorial boards and conference organizing committees. He is also the co-director of the “Learning in Machines & Brains” program at the Canadian Institute for Advanced Research (CIFAR). He is on the boards of IPAM and ICERM, and has advised many companies whiling co-founding startups Elements Inc. and Museami. On the recognition side, he is in the New Jersey Inventor Hall of Fame, the recipient of the 2014 IEEE Neural Network Pioneer Award, the 2015 IEEE PAMI Distinguished Researcher Award, the 2016 Lovie Lifetime Achievement Award, and an Honorary Doctorate from IPN, Mexico.
In this lecture, Professor Lecun started out discussing current development in AI which we are familiar with, then he transitioned to obstacles we currently face. Afterwards, he went on discussing predictive learning and adversarial training, which is a new concept Goodfellow created in 2014 . Throughout this lecture, the word “common sense” was mentioned many times, I will explain its importance later.
This lecture is divided into 4 topics:
- A brief review of current development in AI
- Obstacles to AI
- Predictive learning (unsupervised learning)
- Adversarial training
All the successes in AI we had in the last few years are based on supervised learning. We can train a machine using a lot of examples of tables, chairs, dogs, cars, and people. But can it recognize tables, chairs, dogs, cars, and people it has never seen before?
The following two slides state the process for training deep neural networks. The gray images in the second slide are all the features extracted by each layer. Of course, if you have difficulty understanding these two slides, a good place to start is to first read up on Convolutional Neural Networks (CNN), then learn the backpropagation algorithm.
Then, Professor Lecun went on stating the three Deep ConvNet Architectures: VGG, GoogleNet and ResNet.
After that, he introduced some application in driving: Image captioning and Semantic Segmentation with ConvNets for a running car. He also gave some examples on Image Recognition.
To understand these topics, we need to have some background knowledge in computer vision and ConvNets.
As you can see from the above slide, machines need to acquire some level of common sense. By learning through observation and action so as to make accurate predictions and planning. And by paying attention to important events and remember relevant events to predict which sequence of actions will lead to a desired state of the world.
Intelligence & Common Sense = Perception + Predictive Model + Memory + Reasoning & Planning.
Common Sense is the ability to fill in the blanks
- Infer the state of the world from partial information
- Infer the future from the past and present
- Infer past events from the present state
- Filling in the visual field at the retinal blind spot
- Filling in occluded images
- Filling in missing segments in text, missing words in speech.
- Predicting the consequences of our actions
- Predicting the sequence of actions leading to a result
We have our human common sense. For example, if we see this picture below:
We know this man picked up his bag to leave the room. We have common sense because we know how the world works, but how do we get machines to learn this?
Predictive learning is the ability to predict any part of the past, present or future from whatever information is available. This is also what many people refer to as unsupervised learning.
The Necessity of Unsupervised Learning / Predictive Learning
The number of samples required to train a large learning machine (for any task) depends on the amount of information that we ask it to predict.
The more you ask of the machine, the larger this sample size will be.
If you want to train a very complex system with many parameters, you need many examples before the system can predict all the parameters.
“The brain has about 10^14 synapses and we only live for about 10^9 seconds. So we have a lot more parameters than data. This motivates the idea that we must do a lot of unsupervised learning since the perceptual input (including proprioception) is the only place we can get 10^5 dimensions of constraint per second.” (Geoffery Hinton in his 2014 AMA on Reddit, but he has been saying that since the late 1970’s)
But predicting human-provided labels and a value function is not enough. Professor Lecun gave us an example on how much information different learning algorithms need to predict in slide below.
Then, he discussed Reinforcement Learning systems using two papers on predicting frames of videos, which are research results in “Facebook: Won the VizDoom 2016 competition” [Wu & Tian, submitted to ICLR 2017] and “Plug: TorchCraft: interface between Torch and StarCraft (on github)” [Usunier, Synnaeve, Lin, Chintala, submitted to ICLR 2017].
He also mentioned AlphaGo here, which is a successful application of AI, but it’s hard to use it in the real world. Because the world of go is gradual and orderly, learning systems can get experience simply by running training examples over and over again. But, the real world has many issues and we cannot accelerate the real world in order to train systems.
The architecture of an Intelligent System
This section is on the architecture of AI, and I feel it is an important topic, therefore I put four slides here. However, Professor Lecun did not talk much in this section other than the slides.
The theory above is very similar to the control theory below .
The slide below is a brief but clear picture about the architecture of AI. it is very easy to understand if you know the basic process of pattern recognition.
In the slide below, Professor Lecun introduced a model to predict the trajectories of falling blocks using the unreal game engine. Because a game engine is a slightly different from real world physics, this model is only useful in the game engine. He talked about real objects in the real world afterwords.
Learning Predictive Forward Models Of the World
Inferring the state of the world from Text: Entity RNN
Though supervised ConvNets have seen significant progress, we still need memory-augmented networks to give machines the ability to reason. Professor Lecun presented some slides to help us understand the memory/stack-augmented Recurrent Nets.
Augmenting Neural Nets with a Memory Module
- Recurrent networks cannot remember things for very long
- The cortex only remembers things for 20 seconds
- We need a “hippocampus” (a separate memory module)
- LSTM [Hochreiter 1997], registers
- Memory networks [Weston et 2014] (FAIR), associative memory
- Stacked-Augmented Recurrent Neural Net [Joulin & Mikolov 2014] (FAIR)
- Neural Turing Machine [Graves 2014],
- Differentiable Neural Computer [Graves 2016]
He also gave us an example to demonstrate the result of MemNN.
EntNet is the first model to solve all 20 bAbI tasks .
Energy-Based Unsupervised Learning
Seven Strategies to Shape the Energy Function:
1. Build the machine such that the volume of low energy stuff is constant
- PCA, K-means, GMM, square ICA
2. Push down the energy of data points, push up everywhere else
- Max likelihood (needs tractable partition function)
3. Push down the energy of data points, push up on chosen locations
- Contrastive divergence, Ratio Matching, Noise Contrastive Estimation, Minimum Probability Flow
4. Minimize gradient and maximize curvature around data points
- Score matching
5.Train a dynamical system such that the dynamics goes to the manifold
- Denoising auto-encoder
6. Use a regularize that limits the volume of space that has low energy
- Sparse coding, sparse auto-encoder, PSD
7. If E(Y) = ||Y – G(Y)||^2, make G(Y) as “constant” as possible
- Contracting auto-encoder, saturating auto-encoder
Professor Lecun then points to Generative Adversarial Networks (GANs), which was introduced by Ian Goodfellow in 2014, as a way to improve the machine’s ability to predict the world. GANs consists of a generator and a discriminator that learns simultaneously. You can read more on this in reference and .
Below is an example of predictions in the real world. A pencil may point to any direction when people remove his finger. How can we predict it correctly? This is a complex problem, talking with physics students may be helpful.
Finally, Professor Lecun showed us some interesting examples about video prediction using Multi-scale ConvNet without pooling, and he said he doesn’t know why pooling doesn’t work here.
Our brains are “prediction machines”, but can we train machines to predict the future? We have seen some success with “adversarial training”[Mathieu, Couprie, LeCun arXiv:1511:05440], but we are far from a complete solution.
In conclusion, Professor Lecun summarized the progress of AI in last year, and introduce some knowledge points on Supervised Learning. Then he focused his lecture on Unsupervised Learning, and how he thinks Unsupervised Learning could be the mainstream technique of the future, how it can solve many problems that are difficult for today’s learning systems to handle. Unsupervised and predictive forward model building are the hill we’re facing right now, and could be the challenge for the next several years. On the other hand, adversarial training could also play an important role in the future. The problem today is to have machines learn “common sense”.
From my perspective, I think I got something special from Professor Lecun. He is friendly and helpful. There was a student who asked him a time-consuming question, and he answered, a slight contrary to what I thought. He also told us that speaking with people in other fields, such as physics, may help us solve that problem on predicting the direction of a tripping pencil. Finally, I got a photo with him!
 Goodfellow, Ian, et al. “Generative adversarial nets.” Advances in Neural Information Processing Systems. 2014.
 Athans, Michael, and Peter L. Falb. Optimal control: an introduction to the theory and its applications. Courier Corporation, 2013.
Analyst: Duke Lee | Localized by Synced Global Team : Xiang Chen