Learning how to learn is something most humans do well, by leveraging previous experiences to inform the learning processes for new tasks. Endowing AI systems with such abilities however remains challenging, as it requires the machine learners to learn update rules, which typically have been manually tuned for each task.
The field of meta-learning studies how to enable machine learners to learn how to learn, and is a critical research area for improving the efficiency of AI agents. One of the approaches is for learners to learn an update rule by applying it on previous steps and then evaluating the corresponding performance.
To fully unlock the potential of meta-learning, it is necessary to overcome both the meta-optimization problem and myopic meta objectives. To tackle these issues, a research team from DeepMind has proposed an algorithm designed to enable meta-learners to teach themselves.
For a meta-learner to learn its update rules, it first needs to evaluate these update rules. This requires applying them before the evaluation process — which can lead to prohibitively high computation costs.
Previous studies have assumed that optimizing performance after some (K) applications of the update rule will yield improved performance for the remainder of the learner’s lifetime. However, if this assumption fails, meta-learners will suffer from a short-horizon bias. Furthermore, optimizing the learner’s performance after K updates can also fail to account for the learning process itself.
Such a meta-optimization process also creates two bottlenecks: 1) Curvature: the meta-objective is constrained to the same type of geometry as the learner; 2) Myopia: the meta-objective is fundamentally limited to evaluating performance within the K-step horizon, but ignores future learning dynamics.
The proposed algorithm includes two main features to overcome these issues. Firstly, to mitigate myopia, it leverages bootstrapping methods to infuse information about learning dynamics into the objective. Secondly, the meta-objective is formulated in terms of minimizing the distance to the bootstrapped target to control curvature. The general idea behind the proposed algorithm is thus that a meta-learner can effectively learn to learn from itself by matching future desired updates with fewer steps.
The researchers explain that their proposed algorithm constructs the meta-objective in two steps:
- It bootstraps a target from the learner’s new parameters. In this paper, we generate targets by continuing to update the learner’s parameters — either under the meta-learned update rule or another update rule — for some number of steps.
- The learner’s new parameters — which are a function of the meta-learner’s parameters — and the target are projected onto a matching space. A simple example is Euclidean parameter space. To control curvature, we may choose a different (pseudo-)metric space. For instance, a common choice under probabilistic models is the Kullback-Leibler (KL) divergence.
Overall, the meta-learner’s objective is to minimize the distance to the bootstrapped target. To this end, the team applies a novel Bootstrapped Meta-Gradient (BMG) to infuse information of future learning dynamics without increasing the update steps to backpropagate through. The BMG can thus speedup the optimization process and, as the paper demonstrates, guarantee performance improvements.
The team conducted extensive experiments to test the performance of BMG over standard meta-gradients. These were performed using a typical reinforcement learning Markov decision process (MDP) task: learning a policy that maximizes the value given an expectation.
In the evaluations, BMG demonstrated substantial performance improvements on the Atari ALE benchmark, achieving a new state-of-the-art. BMG also improved on model-agnostic meta-learning (MAML) in the few-shot setting, indicating the study’s potential to open up new possibilities for efficient meta-learning exploration.
The paper Bootstrapped Meta-Learning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.