Solving optimization problems is crucial for real-world AI applications ranging from capital market investment to neural network training. A drawback with traditional optimizers is that they require manual design and do not aggregate experiences across the solving of multiple related optimization tasks. This has made learned optimization — where the network itself learns to optimize a function by parameterizing a gradient-based step calculation — a research area of growing interest.
In the new paper Transformer-Based Learned Optimization, a Google Research and Lund University team presents Optimus, a novel and expressive neural network architecture for learned optimization that captures complex dependencies in the parameter space and achieves competitive results on real-world tasks and benchmark optimization problems.

The proposed Optimus is inspired by the classical Broyden–Fletcher–Goldfarb–Shanno (BFGS) method for estimating the inverse Hessian matrix. Like BFGS, Optimus iteratively updates the preconditioner using rank-one updates. Optimus however differs from BFGS in its use of a transformer-based architecture to generate the updates from features encoding an optimization trajectory.

The team uses Persistent Evolution Strategies (PES, Vicol et al., 2021) to train Optimus. They note that unlike previous methods that rely on updates operating on each target parameter independently (or couple them only via normalization), their approach allows for more complex inter-dimensional relationships via self-attention while still showing good generalization to different target problem sizes than those used in training.
In their empirical studies, the team evaluated Optimus on the popular real-world task of physics-based articulated 3D human motion reconstruction and classical optimization problems, comparing its performance with standard optimization algorithms BFGS, Adam, gradient descent (SGD), and gradient descent with momentum (SGD-M).


In the experiments, the team observed at least a 10x reduction in the number of update steps for half of the classical optimization problems. Optimus was also shown to generalize well across diverse motions on the physics-based 3D human motion reconstruction task, achieving a 5x speed up in its meta-training compared to prior work and producing better quality reconstructions than BFGS.
This work demonstrates the effectiveness of the proposed Optimus learned optimization approach, although the paper acknowledges this power and expressiveness comes at the cost of a significantly increased computational burden. The team believes it may be possible to address this limitation through learned factorization of the estimated prediction matrix.
The paper Transformer-Based Learned Optimization is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “Google & Lund U’s Optimus Learned Optimization Architecture Efficiently Captures Complex Dependencies”