Low-Rank Adaptation (LoRA) is a novel technique introduced by Microsoft in 2021 for fine-tuning large language models (LLMs). LoRA is an efficient adaptation strategy that introduces no additional inference latency and substantially reduces the number of trainable parameters for downstream tasks while maintaining model quality.
Although LoRA was initially proposed for LLMs, it also can be applied elsewhere. Inspired by the Stable Diffusion paper published in 2022 by the Ludwig Maximilian University of Munich, Heidelberg University and Runway ML, independent researchers such as Simo Ryu (@cloneofsimo) came up with the idea of applying LoRA to Stable Diffusion, posting multiple examples and insights on their GitHub project page.
A team from the machine learning platform Hugging Face recently collaborated with Ryu to provide a general approach that enables users to implement LoRA in diffusion models such as Stable Diffusion via Dreambooth and full fine-tuning methods.
The team summarizes the benefits of their LoRA training support in diffusers as follows:
- Training is much faster.
- Compute requirements are lower. We could create a full fine-tuned model in a 2080 Ti with 11 GB of VRAM!
- Trained weights are much, much smaller. Because the original model is frozen and we inject new layers to be trained, we can save the weights for the new layers as a single file that weighs in at ~3 MB in size. This is about one thousand times smaller than the original size of the UNet model!
Because full-model fine-tuning of Stable Diffusion is challenging and time-consuming, the researchers leverage LoRA to simplify the fine-tuning process on a custom dataset. This makes it possible for developers to publish a single 3.29 MB file that will allow others to access and use their fine-tuned models.
The team provides a LoRA fine-tuning script that can run on only 11 GB of GPU RAM without optimizers. Notably, the learning rate is much larger than the non-LoRA Dreambooth fine-tuning learning rate (typically 1e-4 as opposed to ~1e-6).
Model fine-tuning using the Lambda Labs Pokémon dataset
With regard to inference, the team demonstrates how their scripts can achieve excellent results by training orders of magnitude fewer weights than the original model.
The team also shows that LoRA is compatible with Dreambooth, a method that allows users to “teach” new concepts to a Stable Diffusion model, and summarize the advantages of applying LoRA on Dreambooth as follows:
- Training is faster.
- We only need a few images of the subject we want to train (5 or 10 are usually enough).
- We can tweak the text encoder, if we want, for additional fidelity to the subject.
Additional information is available in the references below.
- LoRA fine-tuning script: diffusers/train_text_to_image_lora.py at main · huggingface/diffusers · GitHub
- Lambda Labs Pokémon datasets: https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions
- LoRA training document: https://huggingface.co/docs/diffusers/main/en/training/lora
- Diffusers script: https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py
- Simo Ryu’s LoRA project: https://github.com/cloneofsimo/lora
- LoRA: Low-Rank Adaptation of Large Language Models: [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models (arxiv.org)
- High-Resolution Image Synthesis with Latent Diffusion Models: [2112.10752] High-Resolution Image Synthesis with Latent Diffusion Models (arxiv.org)
- Hugging Face LoRA blog post: https://huggingface.co/blog/lora
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Pingback: Hugging Face Releases LoRA Scripts for Efficient Stable Diffusion Fine-Tuning | GPT AI News
Once released the LoRA script is working very well. The sound is stable and better than before.