AI Machine Learning & Data Science Research

Huawei’s DiffFit Unlocks the Transferability of Large Diffusion Models to New Domains

In the new paper DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning, a Huawei Noah’s Ark Lab research team introduces DiffFit, a parameter-efficient fine-tuning technique that enables fast adaptation to new domains for diffusion image generation. Compared to full fine-tuning approaches, DiffFit achieves 2x training speed-ups while using only ~0.12 percent of trainable parameters.

The astonishing performance of diffusion models on tasks such as image synthesis, video generation and 3D editing has made them a model class of choice in the computer vision research community. The poor transferability of large pretrained diffusion models to target downstream tasks however remains both a challenge for researchers and a bottleneck for real-life applications.

In the new paper DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning, a Huawei Noah’s Ark Lab research team introduces DiffFit, a parameter-efficient fine-tuning technique that enables fast adaptation to new domains (e.g. different datasets or varying resolutions) for diffusion image generation. Compared to full fine-tuning approaches, DiffFit achieves 2x training speed-ups while using only ~0.12 percent of trainable parameters.

The team summarizes their main contributions as follows:

  1. We propose a simple parameter-efficient fine-tuning approach for diffusion image generation named DiffFit.
  2. We conduct an intuitive theoretical analysis and design detailed ablation studies to provide a deeper understanding of why this simple parameter-efficient finetuning strategy can fast adapt to new distributions.
  3. We show that by treating high-resolution image generation as a downstream task of the low-resolution pretrained generative model, DiffFit can be seamlessly extended to achieve superior generation results with FID 3.02 on ImageNet and reduce training time by 30 times, thereby demonstrating its scalability.

DiffFit is built upon Diffusion Transformers (DiTs), a recently introduced family of transformer-based diffusion models with good scalability that outperform traditional diffusion models. DiffFit inherits these benefits while being much more parameter efficient.

DiffFit differs from DiT in that it freezes most parameters in the latent diffusion model and trains only the bias term inputs, normalization, and the class condition module. The team also introduces learnable scale factors into several blocks in the diffusion model, where they initiate the scale factor as 1.0 and multiply it on the corresponding layers of each block. The blocks have multiple components, including multi-head self-attention, feed-forward networks and layer normalization.

The combination of frozen strategy and model design minimizes disruption to the pretrained weights — as DiffFIt only updates a tiny fraction (about 0.12 percent) of its parameters — and speeds up training times by approximately 2x compared to full fine-tuning. Moreover, by strengthening the learned knowledge of the pretrained model, this approach enables faster adaption to specific tasks and avoids catastrophic forgetting issues.

In their empirical study, the researchers compared DiffFit with baseline methods (full fine-tuning, adapt-parallel, BitFit LoRA-R8, etc.) on fine-grained datasets Food101, SUN397, DF-20M mini, Caltech101, CUB-200-2011, ArtBench-10, Oxford Flowers, and Stanford Cars. In the experiments, DiffFit tuned only ~0.12 percent of the parameters and achieved the best overall FID scores.

This study introduces DiffFit, a simple and parameter-effective strategy that significantly speeds up model fine-tuning without sacrificing performance. The team hopes their work will shed light on and encourage more efficient fine-tuning approaches for larger diffusion models.

The paper DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Huawei’s DiffFit Unlocks the Transferability of Large Diffusion Models to New Domains

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: