Second nature for humans, meta-learning is a powerful yet challenging technique for AI systems that enables a model to meta-learn the parameters of a parameterized algorithm by self-evaluating its performance and adapting the learned parameterized algorithm to a given task. While the meta-learning paradigm has proven successful across various machine learning applications, its theoretical properties remain relatively underexplored.
In the new paper Optimistic Meta-Gradients, a DeepMind research team explores the connection between gradient-based meta-learning and convex optimization, demonstrating that optimism (a prediction of the next gradient) in meta-learning is achievable via the Bootstrapped Meta-Gradients approach (BMG, Flennerhag et al., 2022).
The team summarizes their main contributions as follows:
- We show that meta-learning contains gradient descent with momentum (Heavy Ball, Polyak, 1964) and Nesterov Acceleration (Nesterov, 1983) as special cases.
- We show that gradient-based meta-learning can be understood as a non-linear transformation of an underlying optimization method.
- We establish rates of convergence for meta-learning in the convex setting.
- We show that optimism can be expressed through Bootstrapped Meta-Gradients (BMG). Our analysis provides a first proof of convergence for BMG.
A meta-learner aims to optimize its meta-parameters to yield effective updates. Prior studies have tended to treat meta-optimization as an online problem and derive convergence guarantees. This paper takes a different approach, treating meta-learning as a non-linear transformation of classical optimization.
The team first analyses meta-learning using recent convex optimization techniques, where they consider optimism with meta-learning in the convex setting and validate the accelerated rates of convergence. They then show how optimism in meta-learning can be expressed using BMG, and provide the first proof of convergence for the BMG method.
Comparing momentum to a meta-learned step-size, the team finds that introducing a non-linearity update rule can improve the convergence rate. They also compare an AdaGrad sub-gradient algorithm for stochastic optimization to a meta-learned version to validate that meta-learning the scale vector consistently speeds up convergence. Finally, the team compares a conventional meta-learning approach without optimism to optimistic meta-learning, with the results again demonstrating that optimism is crucial for meta-learning to achieve acceleration.
Overall, this work provides valuable new insights into the connection between convex optimization and meta-learning and validates optimism’s role in accelerating meta-learning.
The paper Optimistic Meta-Gradients is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.