Yoshua Bengio, Geoffrey Hinton, Richard Sutton and Ruslan Salakhutdinov Panel Summary at 2016
I was at the 2016 “Machine Learning and the Market for Intelligence” conference held by Creative Destruction Lab in Toronto last Thursday. The all-star line-up made it an intense day. I can’t wait to write about it and share with you what I saw and heard from the conference.
What we have done is insignificant relative to the business community because: computer science department people will fly 6 hours to come to Canada (like Kevin Kelly and Steve) — — by: host of the conference
Shall I also mention that the moderator was Steve Jurvetson — — partner of DJF ( Draper Fisher Juvetson)? Apart from his job title, who is he? It actually doesn’t matter too much who he is in comparison to the panelists — but yeah, Hotmail, TradeX, SpaceX, Nervana, D-Wave and Tesla all took his money ( he owned the world’s first Tesla model S and the second Tesla model X, right after Elon Musk). Even for someone as smart and successful as Steve, it was a stressful task to moderate this panel.
“What’s Next?”
The panel’s theme was “What’s Next? The Research Frontier ”. For the first part of the panel, Steve asked each panelist to discuss what will happen in the AI research field, especially for machine learning. In other words, for the next 5 years, what are the problems Professor Bengio, Hinton, Sutton, and Salakhutdinov believe we need to solve? In what directions will we make the greatest progress?
Below is my summary of the panel. I may have missed some points and understood a few lines. Please correct me if you find any mistakes, I will be more than happy to revise them ( chain.zhang@jiqizhixin.com ).
Yoshua Bengio
Professor Bengio is a leading figure behind the rise of deep learning, together with Professors Hinton and Professor LeCun. They most responsible for nurturing deep learning throughout the 1980s, ’90s, and early 2000s when few others saw its potential.
Explanatory Unsupervised Learning
- Current models cheat by picking on surface regularities
Professor Bengio gave an example on the current image recognition systems. They get fooled by various kinds of hues. Background greenery still increases the possibility of recognizing as wild animal. [add paper source if possible]
- Humans are very good at unsupervised learning, but machines are not really
Even a 2 year old child can understand intuitive physics, like a dropping ball or falling liquid. They do not need to know any physics theory, and they know what they need to do. Machines still have problems doing this.
- Predictive, causal, explanatory models with latent variables are necessary to handle rare & dangerous states
Models with latent variables are a very important to work on
Besides having better models, it is important to deal with unlabeled data (unlabeled means no added human interpretation). Models with latent variables are important because there are causal relationships between the things we observe and the things we cannot observe.
Professor Bengio used building safe autonomous vehicles as an example, which should be able to figure out new dangerous scenarios — that did not exist in training sets — when they encounter them.
- Applications to model-based reinforcement learning, domain adaptation, semi-supervised learning(tons of unlabeled data), multi-task learning, etc.
Steve asked Professor Bengio if causal model can be derived from an unsupervised way. Professor Bengio thought it will be a crucial element, but we have to use all sources of information. We should use both labelled and unlabelled data. In fact, most of the data we provide are unlabeled, which means humans don’t add interpretations to them.
- Memory
- Professor Bengio also mentioned that memory is a hot research area for the next several years. Why so? Memory is connected with the notion of reasoning. That’s why it is important for the field. Reasoning is essentially the combination of information pieces. In order to come up with conclusions and be able to predict the future, you have to go through sequences of predictive steps and run through them properly. Memory is heavily involved in the process.
Geoffrey Hinton
Professor Geoffery Hinton was one of the first researchers who demonstrated the use of backpropagation algorithm used to train multi-layer neural nets and he is an important figure in the deep learning community.
[]
Professor Hinton started his talk with a joke he frequently made— — he told students not to become radiologist because the jobs will be gone by the next 5 years and replaced by deep learning applications. Professor Hinton pointed out that, generally, with fast chips, deep learning is helpful to all problems that need prediction, under the condition that you have lots of data.
[]
Two things Professor Hinton thought might happen in the next year:
- Discovering a new model of “neurons” that will work better than our standard logistic units or rectified linear units (ReLU)
Professor Hintons talked about how artificial neuron’s definition had evolved since the 1950s.
During the 1950s, people thought neurons were logic gates, which are deterministic and binary, only to find out that they are actually stochastic binary. In the 1980s, during Professor Hinton’s generation, people switched from deterministic logic gates to sigmoid logistic units. Over the past 30 years, people used just logistic units, until 2011 in AlexNet, when the revolutionary rectified linear unit was introduced.
Professor Hinton also mentioned effective support from Google for his research undertakings. He joked that Google is actually a better funding provider than NSERC (Natural Sciences and Engineering Research Council of Canada).
- Neural nets with huge numbers of parameters on relatively small data sets
Rather than exploring unsupervised learning, Professor Hinton believes that there is also something in supervised learning that needs to be done.
Computation power will become cheaper and data sets larger, which are the current trends. Professor Hinton believes that computation power will become cheaper faster than data sets getting larger. This means that having more parameters might not be a nightmare at all.
Several weeks before, in a talk given to graduate students at University of Toronto, Professor Hinton expressed the view that in order to do a good job, it is desirable to exploit our computation power and inject as many parameters as we can into our networks to capture all regularities (both reliable and unreliable) within the data and use all combined opinions to make predictions. Empirically, this method has proven to be successful in completing many tasks.
In addition to the idea that more parameters make a better model, he explained them using a human example:
In human, we have 100,000,000,000,000 synapses; within one life time a person makes about 10 million fixations. If you divide this number with 5 seconds per fixation, it is 2 million seconds. If we treat each fixation as a data point, there are a discrepancy of about 10 thousands between the number of parameters and number of data. If you do the statistics one by one, what they teach you is that a model works better when you have more parameters.
Continuing with the example, Professor Hinton pointed out that the brain has 10 thousands more times of data and it processes things just fine. The reason might be that we haven’t come up with a regularized method for drop out.
Steve wondered at the things that guided Professor Hinton to think the most leverage is between among across??? all parameters and if it is possible to use machine learning to optimize parameters. Hinton was positive that we can turn it into a machine learning problem and this will help science. However, Hinton didn’t elaborate much further regarding parameters, but pointed out that people still don’t know exactly what neurons are and we are trying to avoid putting the basic properties that we observe from real neuron to artificial neuron. Current artificial neurons still have many limitations, even at computing some very easy problems. We will need to change the basic neuron to really solve problems.
Richard Sutton
[]
Professor Richard Sutton is considered to be one of the founding fathers of modern computational reinforcement learning. He made several significant contributions to the field, which include: temporal difference learning, policy gradient methods, the Dyna architecture.
[]
Professor Sutton is interested in Intelligence per se and has been pursuing this topic with his entire research career. Instead of talking about things that help companies make profit for next 12 months, Professor Sutton talked about important advancements in machine learning.
- The ability to learn at scale from ordinary experiences
- Enable machine learning to scale to the next level
- Use deep reinforcement learning for long-term prediction, (probably) and/or unsupervised learning
The current machine learning process is not the way people learn. Learning should be based on interactions with the real world without the need of a training set of labeled data. It should happen naturally, like how a child or animal learns. It should be about how the world works, about cause and effect.
He talked about learning like people and animals from experience. In current approaches like deep learning, we construct enormous training set from experiences. Learning like human matters. It is about scalability and the limitation of training sets. That is why current systems are “learned” instead of “learning”. A learned system is built from training data. In a learning system, or in an online fashion, the system will improve with new experiences. As Sutton summarized: learn how the world works, find scalable methods with deep learning, use deep reinforcement learning for long term prediction and unsupervised learning.
Professor Sutton said both Professor Bengio and himself addressed the same goals using different terminologies and methods. He used the Reinforcement Learning method while Professor Bengio used unsupervised training.
Finally, he said, there is a long way to go. We are at the beginning of the journey, and we will find stable methods to keep pace or even be ahead of the Moore’s Law.
When Steve asked him about the connection analogy between human and the brain, Professor Sutton simply smiled and bounced the question to Professor Bengio. Professor Bengio mentioned that logistic unit model is heavily influenced by neuron science hypothesis. We need to explore more and narrow the gap between neuroscience and machine learning.
- Ruslan Salakhutdinov
Professor Salakhutdinov is an Associate Professor in the Machine Learning Department, School of Computer Science at Carnegie Mellon University. He was previously a Professor at University of Toronto and a PhD student of Professor Hinton.
[]
Professor Salakhutdinov talked about 4 major open challenges to both industry and academia in the next 1 – 3 years.
[]
- Unsupervised Learning / One -Shot Learning / Transfer Learning
Professor Salakhutdinov said at CMU that his lab has been doing many researches using massive volume of data and is able to extract structure from the data using supervised learning. We are not there yet.
He also stressed on One-Shot Learning, of which Professor Bengio and Hinton both mentioned. How do we learn new classes and concept with just a handful of example? People can do this quickly but machines can’t do it.
- Reasoning, Attention, and Memory
Professor Salakhutdinov didn’t go deep into this topic but raised the problem of how to build systems that have built-in memory that we can use to make decisions.
- Natural Language Understanding/ Dialogue and Questions/ Answering Systems
Although we made progress in natural language reasoning, we are still far away from fully understanding human languages so that machines can interact with people.
Steve asked Professor Salakhutdinov if building embedding memory is the maker of natural language understanding context and continuous dialogue, his answer was that we need formal memory network that we can write and store in the network. Building and designing new architect in NN is something we need to explore.
When Steve asked about the timeline of achieving natural conversational interface, Professor Salakhutdinov said that we will definitely make a lot of progress in a constraint environment, but not in general AI.
Professor Bengio simply joked, “Serious scientists don’t give timeline.”
- Deep Reinforcement Learning
Professor Salakhutdinov recommended to people who are interested in this topic to read professor Sutton’s book. He said if we look back at what had been done in the 1980s and 1990s, we can actually build and scale them now to make great progress.
What Else ?
In addition to talking about their visions on the field’s near future, the professors also discussed many interesting ideas and joked around with each other during the talk and Q & A. Some highlights were picked up and summarized here.
- Professor Hinton’s inspiration: “How does it work?”
After all four professors gave their talk on what will happen next in the field, Steve asked them about the things they believe to be true and inspired them to build careers without having a definite timeline. (Recall back to the joke made by Professor Bengio: “Serious scientists don’t give a timeline.”)
Professor Hinton said that he is passionate about figuring out how the brain does it. For example, in order to develop proper representation, the brain might not just be memory, but transforming representation from one to another. (This part is really hard for me to understand, please correct me if I get it wrong. ) Solving this kind of questions is really really hard, but he finds it attractive.
Steve, “So the deep passion is [that] the more we understand how the real brain works, the better we will make an analog?”
Professor Hinton,“No, I don’t care about that. I only care about how brain works.”
[]
- What is Professor Geoffrey Hinton doing at Google? What is Professor Ruslan Salakhutdinov doing at Apple?
Professor Russ Salakhutdinov recently joined Apple as a director of Machine Learning. He said they are building a team of top scientists. There is a lot of tough projects and researches to work on. His role is to make sure they can create good algorithms, now that machine learning is used everywhere in the company. He is excited to oversee all these areas.
On the other side, Professor Hinton mentioned that Google heavily supports his research, which looks for the new type of artificial neuron and answers the questions of “how it works”.
- Is there a simple explanation of Intelligence?
While discussing freewill (yes, they did discuss this!), Professor Bengio said a bit about looking for a simple explanation of Intelligence. Bengio mentioned that Professor Pedro Domingos wrote in his book: there are some underlying assumptions in lot of machine learning (and deep learning) regarding Intelligence. There might be a few simple principles that can be understood to explain our intelligence and the ability of our brain to understand the world. If it is a complicated method we can’t understand, it is not interesting.
[]
- Professor Geoffrey Hinton’s recent work
Every now and then during the discussions, professor Hinton mentioned the breakthroughs he recently made. These deliverances were really hard to follow, in accomplice to Professor Hinton’s heavy British accent. Professor Bengio suggested to audience interested in this topic to read Hinton’s paper, published along with his colleagues. (https://arxiv.org/abs/1610.06258) .
Fast Weights to Attend to the Recent Past
Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs. There is no good reason for this restriction. Synapses have dynamics at many different time-scales and this suggests that artificial neural networks might benefit from variables that change slower than activities but much faster than the standard weights. These “fast weights” can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.
- Professor Yoshua Bengio: “Free will is an illusion!”
When Steve asked a question regarding freewill and computational determinism, Professor Bengio gave this answer.
- Heats on Deep Reinforcement Learning
Both Professor Sutton and Professor Salakhutdinov stressed on Deep Reinforcement Learning. Will this lead to general intelligence? We do not know. But what we know is that there is a lot to be done, and done following this direction.
[]
- It is important to understand how the brain works?
Professor Hinton discussed his hypothesis on how neurons, synapses, pulse, current, and others work. He also mentioned that Turning had believed in NN as well, yet we don’t really understand how it works.
Professor Bengio shared the same vision with Professor Hinton in that it is very important to understand how the brain works— — we will be crazy if we don’t explore more about it.
Professor Hinton, “We don’t understand how it works. ”
Professor Bengio, “Or we say, we don’t understand how we work !”
[]
This concludes the 1st panel summary of the 2016 Machine Learning and the Market for Intelligence conference held by Creative Destruction Lab in Toronto on October 27. We hope the summary can help you to get some ideas of the top researchers’ visions.
More summaries of other panels will soon be published.
Subscribe the Synced channel to follow more interesting summaries.
Original article from Synced China http://www.jiqizhixin.com/ | Localized by Synced Global Team |
0 comments on “Machine Learning and the Market for Intelligence conference held by Creative Destruction Lab in Toronto”