Second nature for humans, meta-learning is a powerful yet challenging technique for AI systems that enables a model to meta-learn the parameters of a parameterized algorithm by self-evaluating its performance and adapting the learned parameterized algorithm to a given task. While the meta-learning paradigm has proven successful across various machine learning applications, its theoretical properties remain relatively underexplored.
In the new paper Optimistic Meta-Gradients, a DeepMind research team explores the connection between gradient-based meta-learning and convex optimization, demonstrating that optimism (a prediction of the next gradient) in meta-learning is achievable via the Bootstrapped Meta-Gradients approach (BMG, Flennerhag et al., 2022).

The team summarizes their main contributions as follows:
- We show that meta-learning contains gradient descent with momentum (Heavy Ball, Polyak, 1964) and Nesterov Acceleration (Nesterov, 1983) as special cases.
- We show that gradient-based meta-learning can be understood as a non-linear transformation of an underlying optimization method.
- We establish rates of convergence for meta-learning in the convex setting.
- We show that optimism can be expressed through Bootstrapped Meta-Gradients (BMG). Our analysis provides a first proof of convergence for BMG.

A meta-learner aims to optimize its meta-parameters to yield effective updates. Prior studies have tended to treat meta-optimization as an online problem and derive convergence guarantees. This paper takes a different approach, treating meta-learning as a non-linear transformation of classical optimization.

The team first analyses meta-learning using recent convex optimization techniques, where they consider optimism with meta-learning in the convex setting and validate the accelerated rates of convergence. They then show how optimism in meta-learning can be expressed using BMG, and provide the first proof of convergence for the BMG method.
Comparing momentum to a meta-learned step-size, the team finds that introducing a non-linearity update rule can improve the convergence rate. They also compare an AdaGrad sub-gradient algorithm for stochastic optimization to a meta-learned version to validate that meta-learning the scale vector consistently speeds up convergence. Finally, the team compares a conventional meta-learning approach without optimism to optimistic meta-learning, with the results again demonstrating that optimism is crucial for meta-learning to achieve acceleration.
Overall, this work provides valuable new insights into the connection between convex optimization and meta-learning and validates optimism’s role in accelerating meta-learning.
The paper Optimistic Meta-Gradients is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
I’m very thankful that you put aside some of your valuable time to share this enlightening article with us.