While today’s deep neural networks (DNNs) are driving AI’s deep-learning revolution, determining a DNN’s appropriate complexity remains challenging. If a DNN is too shallow, its predictive performance will suffer; if it is too deep, it will tend to overfit, and its complexity will result in prohibitively high compute costs.
In the new paper Variational Inference for Infinitely Deep Neural Networks, researchers from Columbia University propose the unbounded depth neural network (UDN), an infinitely deep probabilistic model that enables self-adaption to the training data without an upper limit, sparing deep learning researchers and practitioners from tough decisions regarding the appropriate complexity for their DNN architectures.
The team summarizes their main contributions as follows:
- We introduce the unbounded depth neural network: an infinitely deep neural network that can produce data from any of its hidden layers. In its posterior, it adapts its truncation to fit the observations.
- We propose a variational inference method with a novel variational family. It maintains a finite but evolving set of variational parameters to explore the unbounded posterior space of the UDN parameters.
- We empirically study the UDN on real and synthetic data. It successfully adapts its complexity to the data at hand. In predictive performance, it outperforms other finite and infinite models.
The proposed UDN comprises a DNN that can be of arbitrary architecture and contain different types of layers, and a latent truncation level drawn from a prior. The DNN generates an infinite sequence of hidden states for each datapoint and uses these hidden states to output responses. Given a dataset, the posterior UDN will generate a conditional distribution over the network’s weights and the truncation depth. This setup imparts the UDN with the flexibility of an “infinite” neural network that can select a distribution of truncations that best describe the data.
The paper also details the researchers’ novel method for approximating the UDN for variational inference, wherein a “variational family” with an infinite number of parameters covers the entire posterior space while each member supports a finite subset, such that the variational objective can be efficiently calculated and optimized. The team’s resulting gradient-based algorithm approximates the posterior UDN, dynamically exploring its infinite space of truncations and weights and enabling the UDN to adapt its depth relative to the data complexity.
In their empirical study, the team applied the proposed UDN on synthetic and real classification tasks, where it outperformed other finite and infinite models in predictive performance, reaching 99 percent accuracy on the easiest label pairs using only a couple of layers and 94 percent accuracy while using the full dataset. The UDN also demonstrated its ability to adapt depth to data complexity — from a few layers to almost a hundred layers — on image classification. The proposed dynamic variational inference method meanwhile demonstrated an effective exploration of the space of truncations.
The researchers believe this work opens several promising avenues for further study — the UDN could be applied to transformer architectures, and the unbounded variational family could be used for variational inference of other infinite models. They suggest future UDN studies could also examine how the form of the neural network’s weight priors impacts the corresponding posteriors.
The paper Variational Inference for Infinitely Deep Neural Networks is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.