Optimization algorithms play an essential role in the training of today’s huge neural networks. While long-established optimizers such as AdamW and Adafactor remain researchers’ go-to, various approaches have been proposed to automatically discover more efficient optimization algorithms. Such auto-discovery methods however have thus far failed to produce optimization algorithms that reach the state-of-the-art.
In the new paper Symbolic Discovery of Optimization Algorithms, a research team from Google and the University of California, Los Angeles presents a method for formulating algorithm discovery as program search and applies it to find EvoLved Sign Momentum (Lion), a simple and effective optimization algorithm. Lion achieves state-of-the-art zero-shot and fine-tuning accuracy on ImageNet while significantly reducing computation costs.
The researchers formulate algorithm discovery as program search and employ symbolic representations in the form of programs. They note that this is consistent with algorithmic implementation as programs; and that, compared to neural networks, symbolic representations like programs are easier to analyze and comprehend and have superior transfer capabilities to new tasks. Further, by observing program length, it is possible to evaluate relative complexity and facilitate the selection of simpler and more generalizable options.
The team adopts various techniques to enable high-quality algorithm search in the infinite and sparse program space and to identify candidates that generalize well from small proxy tasks to more complex and large state-of-the-art tasks. These include evolutionary search with warm-start and restart, abstract execution, funnel selection, and program simplification.
The resulting Lion optimizer differs from previous methods by tracking only momentum (not gradients) and leveraging the sign operation to calculate updates. This reduces memory overhead and enables uniform update magnitudes across all dimensions.
In their empirical study, the team applied Lion to transformer, MLP, ResNet, U-Net, and Hybrid models and evaluated performance on image classification, vision-language contrastive learning, diffusion, language modelling and fine-tuning tasks.
Despite its simplicity, Lion demonstrated impressive performance in the experiments, achieving 88.3 percent zero-shot and 91.1 percent fine-tuning accuracy on ImageNet, improving on previous state-of-the-art results by 2 percent and 0.1 percent, respectively. Lion was also shown to boost training efficiency on diffusion models by 2.3x, with better FID scores, and to match or exceed performance on language modelling with a 2x lower compute cost.
The code is available on the project’s GitHub. The paper Symbolic Discovery of Optimization Algorithms is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.