Google Brain Paper Demystifies Learned Optimizers

Learned optimizers are algorithms that can be trained to solve optimization problems. Although learned optimizers can outperform baseline optimizers in restricted settings, the ML research community understands remarkably little about their inner workings or why they work as well as they do. In a paper currently under review for ICLR 2021, a Google Brain research team attempts to shed some light on the matter.

The researchers explain that optimization algorithms can be considered the basis of modern machine learning. A popular research area in recent years has focused on learning optimization algorithms by directly parameterizing and training an optimizer on a distribution of tasks.

Research on learned optimizers aims to replace the baseline “hand-designed” optimizers with a parametric optimizer trained on a set of tasks, which can then be applied more generally. In contrast to baseline optimizers that use simple update rules derived from theoretical principles, learned optimizers use flexible, high-dimensional, nonlinear parameterizations.

Although previous research efforts have improved the design, training, and performance of learned optimizers, researchers still lack a fundamental understanding of how these systems work. Are learned optimizers simply learning a clever combination of known techniques? Or do they learn fundamentally new behaviours that have not yet been proposed in the optimization literature?

The Google researchers say that understanding the underlying mechanisms of learned optimizers will enable identifying their operational flaws while also deepening our insight into why key ML mechanisms work and how to improve them.

The team developed tools for isolating and elucidating mechanisms in nonlinear, high dimensional learned optimization algorithms to demonstrate how learned optimizers utilize both known and novel techniques across different tasks. They say their research is heavily inspired by recent work using neural networks to parameterize optimizers and on recent studies on reverse engineering dynamical systems.

An optimizer has two parts: the optimizer state that stores information about the current problem and readout weights that update parameters given the current state. The researchers focused on first-order optimizers or component-wise optimizers that are applied to each parameter or component of a problem in parallel.

Learned optimizers have high-dimensional state variables and the potential for rich, nonlinear dynamics in learning complex behaviours. This has historically made it difficult to extract simple, intuitive descriptions for their behaviours. To get a handle on what different optimizers are doing, the researchers started with a simple visualization tool. The method treated the behaviour of optimizers as dynamical systems. To examine a given learned optimizer’s dynamics, the researchers approximated the nonlinear dynamical system via linearized approximations.

WX20201110-214203@2x.png — Learned optimizers outperform well-tuned baselines on three different tasks: (a) linear regression, (b) the Rosenbrock function, and (c) training a neural network on the two moons dataset. Optimizer performance is shown as loss curves.

The researchers then trained the learned optimizers on three disparate, fast-to-train tasks: random linear regression problems; minimizing the Rosenbrock function, a commonly used test function for optimization; and training a neural network to classify a toy dataset. They also tuned baseline optimizers for each task.

Across all three tasks, the learned optimizer outperformed the baseline optimizers on the meta-objective. The researchers further identified four mechanisms in the learned optimizers which they believe led to the superior performance: momentum, gradient clipping, learning rate schedules, and a new form of learning rate adaptation.

“The methods we have developed should be part of a growing toolbox we can use to extract insight from the high-dimensional nonlinear dynamics of learned optimizers, and meta-learned algorithms more generally,” the researchers conclude. The teams says although they have isolated individual mechanisms, developing “a holistic picture of how a learned optimizer stitches these mechanisms together” will have to be addressed in future studies.

The paper Reverse Engineering Learned Optimizers Reveals Known and Novel Mechanisms is on arXiv.

Reporter: Yuan Yuan ｜ Editor: Michael Sarazen

Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

3 comments on “Google Brain Paper Demystifies Learned Optimizers”

Pingback: [R] Google Brain Paper Demystifies Learned Optimizers – tensor.io
Pingback: [R] Google Brain Paper Demystifies Learned Optimizers > Seekalgo
Pingback: Top updates in AI during week 46 of 2020 | Ankitaism

Google Brain Paper Demystifies Learned Optimizers

Like this:

3 comments on “Google Brain Paper Demystifies Learned Optimizers”

Leave a Reply Cancel reply

Related

Share this:

Like this:

3 comments on “Google Brain Paper Demystifies Learned Optimizers”

Leave a Reply Cancel reply

Related