Lossy compression is a data compression approach with a trade-off: it enables high compression but incurs some loss of information, i.e. the compressed data will not be recovered or reconstructed in its exact original state. While this is acceptable and useful in many real-world applications such as JPEG digital image compression, there is increasing interest in the machine learning community in improving lossy compression quality.
In the new paper Lossy Compression with Gaussian Diffusion, a Google Research team aims at “perfect realism” in lossy compression, where the reconstructions are indistinguishable from the real data. The team proposes DiffC, a novel and simple method for efficient data transmission that relies only on an unconditionally trained diffusion generative model and achieves state-of-the-art image compression results despite lacking an encoder transform.
Most modern lossy compression methods comprise an encoder transform, a decoder transform, and an entropy model. The proposed DiffC is a single model that adds isotropic Gaussian noise directly to the pixels and eliminates the need to perform encoder transform.
DiffC first efficiently communicates a corrupted version of the data from the sender to the receiver, then uses a diffusion generative model to generate a reconstruction. By introducing varying degrees of Gaussian noise, the model can flexibly communicate data at arbitrary bitrates, and the reconstructions can thus be generated from a corrupted or incomplete bitstream.
In their empirical study, the researchers conducted a rate-distortion analysis to gain a deeper understanding of DiffC’s performance. They found that isotropic noise is close to optimal; that DiffC produces perceptually pleasing results even at extremely low bitrates of around 0.2 bits per pixel; and that for higher bitrates, a deterministic reconstruction based on the probability flow ordinary differential equation (Song et al., 2021) surpasses ancestral sampling from the state-of-the-art HiFiC (High-Fidelity Generative Image Compression, Mentzer et al., 2020) baseline model (+3dB) and outperforms the BPG (Better Portable Graphics) non-neural image codec.
While the researchers concede that DiffC’s high computational cost makes it impractical in its current form, their study demonstrates the potential of diffusion model-based approaches for simplifying and improving the performance of lossy compression with realism constraints.
The paper Lossy Compression with Gaussian Diffusion is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.