The 36th International Conference on Machine Learning (ICML) kicked off Monday in California. The ICML is one of the world’s two top conferences in ML and AI (the other being NeurIPS). The ICML this year received 3,424 main conference paper submissions and accepted 774 papers for oral and poster presentations.
Conference organizers have announced the recipients of the ICML 2019 Best Paper Awards: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations from Google Research, ETH Zurich, and Max Planck Institute for Intelligent Systems (MPIS-IS); and Rates of Convergence for Sparse Variational Gaussian Process Regression from the University of Cambridge and PROWLER.io.
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations takes a deep dive into disentangled representations, an emerging unsupervised learning technique that separates the underlying structure of data into disjoint parts of its representation. Researchers Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem suggest that unsupervised learning of disentangled representations is fundamentally impossible without inductive biases. They performed a large-scale evaluation on disentangled models and datasets to discover key findings for future research.
Abstract: The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look on recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties `encouraged’ by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.
Google AI also published an explanatory blog page about this paper.
In the paper Rates of Convergence for Sparse Variational Gaussian Process Regression, authorsDavid R. Burt, Carl E. Rasmussen, and Mark van der Wilk study KL divergence to the true posterior — a common metric for assessing the quality of an approximate posterior — and show that smooth kernels with training data concentrated in a small region admit high quality, very sparse approximations.
Abstract: Excellent variational approximations to Gaussian process posteriors have been developed which avoid the (N3) scaling with dataset size N. They reduce the computational cost to (NM2), with M≪N being the number of inducing variables, which summarise the process. While the computational cost seems to be linear in N, the true complexity of the algorithm depends on how M must increase to ensure a certain quality of approximation. We address this by characterising the behavior of an upper bound on the KL divergence to the posterior. We show that with high probability the KL divergence can be made arbitrarily small by growing M more slowly than N. A particular case of interest is that for regression with normally distributed inputs in D-dimensions with the popular Squared Exponential kernel, M=(logDN) is sufficient. Our results show that as datasets grow, Gaussian process posteriors can truly be approximated cheaply, and provide a concrete rule for how to increase M in continual learning scenarios.
Also announced were ICML Honorable Mentions for the following papers:
- Analogies Explained: Towards Understanding Word Embeddings
- SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver
- A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
- Towards A Unified Analysis of Random Fourier Features
- Amortized Monte Carlo Integration
- Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
- Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement
- 452 papers (58.4%) were purely affiliated with academic research
- 60 papers (7.8%) were purely from industrial research organization
- 262 papers (33.9%) have authors affiliated with both academia and industry
- 77% of contributions from academic affiliations
- 23% of contributions from industrial affiliations
- The top three contributing institutes are Google, MIT, and UC Berkeley
- The top three contributing authors are Prof. Michael Jordan from UC Berkeley, Prof. Volkan Cevher from Ecole polytechnique fédérale de Lausanne (EPFL), and Prof. Sergey Levine from UC Berkeley.
The ICML runs through Saturday, June 15 at the Long Beach Convention Center in Long Beach, California.
Journalist: Tony Peng | Editor: Michael Sarazen