RE•WORK organized the second Canadian edition of its RE•WORK Global DL Summit Series in Toronto on October 25 – 26. The event attracted over 600 attendees from more than 20 countries who joined conversations with leading AI and Deep Learning experts. Keynote speakers included “The Godfather of AI” Professor Geoffrey Hinton, Google Brain AI Resident Sara Hooker, Google Brain Research Scientist Shane Gu and many others, who covered topics such as neural networks, image analysis, reinforcement learning, NLP, and speech & pattern recognition.
The RE•WORK Deep Learning Summit in Toronto followed last year’s successful gathering some 500 km northeast in Montreal, another Canadian AI hub. The downtown Metro Toronto Convention Centre was abuzz with academics, investors, industry experts, and a wide range of others with an interest in or passion for Deep Learning and AI. The concurrent AI for Government Summit at the same location reaffirmed Canadian government commitments to the tech and the AI community. In between session at both events, speakers and attendees joined workshops and seized networking opportunities to discuss their work and share industry insights.
Geoffrey Hinton – “I wish I knew this stuff was going to work!”
Professor Hinton reviewed previous studies on ensemble learning, distillation, and label smoothing techniques. He explained that during training a big model it can become too confident; and that it is a challenge to prevent the model being too confident while still allowing distillation. Initially, he thought the solution would be to “penalize the training output distribution if the entropy of the distribution is lower than some threshold.” However, that did not help. Hinton concluded a better idea is to penalize the entropies of the output distributions if the total entropy for a mini-batch is lower than some threshold. Hinton said the idea works on the MNIST database (Modified National Institute of Standards and Technology) and the CIFAR – 10 dataset (Canadian Institute For Advanced Research). Hinton added that Artificial Intelligence Resident at Google Brain Rafael Müller has worked on the proposal.
Hinton concluded with a visual of takeaways from his talk:
- When extracting knowledge from data we can use very big models or very big ensembles of models that are much too cumbersome to deploy
- If we can extract the knowledge from the data it is quite easy to distill nearly all of it into a much smaller model for deployment
- When training a big model, it helps if we prevent the model from becoming too certain
- Label smoothing is an easy way to do this but it screws up distillation
- Forcing the sum of the output entropies on each mini-batch to be above a threshold makes the big model generalize well and also allows distillation to work well