Bayesian neural networks (BNNs) have been shown to offer practical benefits for model application, such as improved predictive uncertainty quantification and model selection compared to other NNs. But the practical deployment of BNNs has been limited, as they are generally considered hard to implement, finicky to tune, expensive to train, and difficult to scale to today’s large models and datasets.
This view is challenged in the new paper Laplace Redux — Effortless Bayesian Deep Learning, in which a research team from the University of Cambridge, University of Tübingen, ETH Zurich and DeepMind conducts extensive experiments that demonstrate the Laplace approximation (LA) can be a simple and cost-efficient yet competitive approximation method for inference in Bayesian deep learning. The team introduces Laplace, a PyTorch-based library for scalable LA in deep neural networks (NNs).
Laplace approximation (LA) is classic and simple method for obtaining an approximate posterior inference in deep NNs. Machine learning researchers however have tended to prefer alternative approaches such as variational Bayes or deep ensembles, assuming LA is too costly due to its Hessian computation requirements or that it yields inferior results. The paper argues that these views are misconceptions.
The researchers summarize their study’s main contributions as:
- We first survey recent advances and present the key components of scalable and practical Laplace approximations in deep learning.
- We then introduce Laplace, an easy-to-use PyTorch-based library for “turning a NN into a BNN” via the LA. Laplace implements a wide range of different LA variants.
- Lastly, using Laplace, we show in an extensive empirical study that the LA is competitive to alternative approaches, especially considering how simple and cheap it is.
LA can benefit deep learning models by approximating the model’s posterior distribution to enable probabilistic predictions and by approximating model evidence to enable model selection. The paper identifies four key components of scalable and practical Laplace approximations in deep learning: 1) Inference on all weights or subsets of weights, 2) Hessian approximations and their factorizations, 3) Hyperparameter tuning, and 4) Approximate predictive distribution.
The researchers first select a part of the model to perform inference over with the LA, then decide how to approximate the Hessian. As such, the team can then perform model selection using the evidence. If they started with an untrained model, they jointly train the model and use the evidence to tune hyperparameters online; if they started with a pretrained model, they use the evidence to tune the hyperparameters post-hoc. It is then possible to compute/approximate the predictive distribution to make predictions for new inputs.
The proposed Laplace toolkit is designed to enable user-friendly implementation for deep Laplace approximations. Laplace is a simple, easy-to-use, extensible library for scalable LAs of deep NNs in PyTorch that enables all possible combinations of the aforementioned four key components and includes efficient implementations of key LA quantities: 1) posterior (i.e. Hessian computation and storage), 2) marginal likelihood, and 3) posterior predictive.
The researchers benchmarked various LAs implemented via Laplace, with the results showing that LA is competitive with strong Bayesian baselines in in-distribution, dataset-shift, and out-of-distribution (OOD) settings.
Overall, this work demonstrates that the Laplace approximation can be competitive with more popular alternatives in terms of performance while maintaining a low computational cost. The team hopes their work can catalyze the wider adoption of the LA in practical deep learning.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.