Text-to-image diffusion models have emerged as powerful tools for generative tasks, consistently delivering remarkable results in terms of generating high-quality and diverse images. However, these models often rely on an iterative refinement process that demands a substantial number of iterations, posing challenges for efficient use.
In response to this challenge, in a new paper Conditional Diffusion Distillation, a research team from Google Research and Johns Hopkins University introduces an innovative framework that distills an unconditional diffusion model into a conditional one, enabling image generation with significantly fewer steps. This results in higher-quality images with the same number of sampling steps when compared to previous two-stage distillation and fine-tuning techniques.


The proposed distilled model takes cues from given image conditions to predict high-quality results in just 1 to 4 sampling steps. This streamlined approach eliminates the need for the original text-to-image data, a prerequisite in previous distillation procedures, making the method more practical. Furthermore, the formulation introduced in this research avoids compromising the diffusion prior, a common issue in the initial stage of the fine-tuning-first procedure.

The central idea behind this work is to optimize an adapted conditional diffusion model from a pre-trained unconditional diffusion model. This optimization aims to ensure two key properties: self-consistency and the ability to generate samples from conditional data. The adapted diffusion model is then fine-tuned with new conditional data, using a conditional diffusion distillation loss that penalizes the difference between the predicted signal and the corresponding image, employing a distance function for distillation learning.

Additionally, this method allows for selective parameter updates related to distillation and conditional fine-tuning, while keeping other parameters frozen. This approach introduces a novel form of parameter-efficient conditional distillation, streamlining the distillation process across commonly-used parameter-efficient diffusion model fine-tuning.


The researchers validate the effectiveness of their approach across various conditional generation tasks, including real-world super-resolution, depth-to-image generation, and instructed image editing. Empirical results showcase that their method outperforms existing distillation techniques within the same sampling time. Importantly, this method represents the first distillation strategy capable of matching the performance of much slower fine-tuned conditional diffusion models.
In summary, this work demonstrates that only a small number of additional parameters are required for each distinct conditional generation task. The team envisions that their method can serve as a practical and potent approach to accelerate large-scale conditional diffusion models, marking a significant advancement in the field of generative models.
The paper Conditional Diffusion Distillation on arXiv.
Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “Efficiency Meets Quality: Google & JHU Pioneers Conditional Diffusion Distillation in Just 1-4 Sampling Steps”