Deep generative models that apply a likelihood function to data distribution have made impressive progress in modelling different sources of data such as images, text and video. A popular such model type is autoregressive models (ARMs), which, although effective, require a pre-specified order for their data generation. ARMs consequently may not be the best choice for generation tasks that involve specific types of data, such as images.
In a new paper, a Google Research team proposes Autoregressive Diffusion Models (ARDMs), a model class encompassing and generalizing order-agnostic autoregressive models and discrete diffusion models that do not require causal masking of model representations and can be trained using an efficient objective that scales favourably to highly-dimensional data.
The team summarises the main contributions of their work as:
- We introduce ARDMs, a variant of order-agnostic ARMs which include the ability to upscale variables.
- We derive an equivalence between ARDMs and absorbing diffusion under a continuous time limit.
- We show that ARDMs can have parallelized inference and generation processes, a property that among other things admits competitive lossless compression with a modest number of network calls.
The researchers explain that from an engineering perspective, the main challenge in parameterizing an ARM is the need to enforce the triangular or causal dependence. To address this, they took inspiration from modern diffusion-based generative models, deriving an objective that is only optimized for a single step at a time. In this way, a different objective for an order-agnostic ARM could be derived.
The team then leveraged an important property of this parametrization — that the distribution over multiple variables is predicted at the same time — to enable the parallel and independent generation of variables.
The researchers also identified an interesting property of upscale ARDM training: complexity is not changed by modelling multiple stages. This enabled them to experiment with adding an arbitrary number of stages during training without any increase in computational complexity.
The team applied two methods to the parametrization of the upscaling distributions: direct parametrization, which requires only distribution parameter outputs that are relevant for the current stage, making it efficient; and data parametrization, which can automatically compute the appropriate probabilities for experimentation with new downscaling processes, but may be expensive as a high number of classes are involved.
In their empirical study, the team compared ARDMs to other order-agnostic generative models, evaluating performance on a character modelling task using the text8 dataset. As expected, the proposed ARDMs performed competitively with existing generative models, and outperformed competing approaches on per-image lossless compression.
Overall, the study validates the effectiveness of the proposed ARDMs as a new class of models at the intersection of autoregressive and discrete diffusion models, whose benefits are summarized as:
- In contrast to standard ARMs, they impose no architectural constraints on the neural networks used to predict the distribution parameters.
- ARDMs require significantly fewer steps than absorbing models to attain the same performance.
- Using dynamic programming approaches developed for diffusion models, ARDMs can be parallelized to generate multiple tokens simultaneously without a substantial reduction in performance.
The paper Autoregressive Diffusion Models is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.