A new study from a DeepMind and Swiss AI Lab IDSIA team proposes using symmetries from backpropagation-based learning to boost the meta-generalization capabilities of black-box meta-learners.
Meta reinforcement learning (RL) is a technique used to automatically discover new RL algorithms from agents’ environmental interactions. While black-box approaches in this space are relatively flexible, they struggle to discover RL algorithms that can generalize to novel environments.
In the paper Introducing Symmetries to Black Box Meta Reinforcement Learning, the researchers explore the role of symmetries in meta generalization and show that introducing more symmetries to black-box meta-learners can improve their ability to generalize to unseen action and observation spaces, tasks, and environments.
The researchers identify three key symmetries that backpropagation-based systems exhibit: use of the same learned learning rule across all nodes of the neural network; the flexibility to work with any input, output and architecture size; and invariance to permutations of the inputs and outputs (for dense layers). They add these symmetries to an existing black-box meta-learning algorithm to improve its generalization capabilities.
To introduce these symmetries, the researchers adapt variable shared meta-learning (VSML) (Kirsch and Schmidhuber, 2020) to an RL setting. VSML is a novel approach that generalizes learned learning rules, fast weights, and MetaRNNs to enable the implementation of backpropagation purely in the recurrent dynamics of an RNN and the learning of meta-learning algorithms for supervised learning from scratch.
The team extends a black-box meta-learning method that exhibits these same symmetries to the meta RL setting to create symmetric learning agents (SymLA), a flexible black-box meta RL algorithm that is less prone to over-fitting.
In their empirical study, the team compared the generalization capabilities of the proposed SymLA to baseline MetaRNNs — first learning to learn on bandits from Wang et al. (2016), then demonstrating generalization to unseen action spaces and applying the learned algorithm to bandits with varying numbers of arms at meta-test time (which MetaRNNs cannot do). The researchers then demonstrated how these symmetries can improve generalization to unseen environments by creating permutations of observations and actions in classic control benchmarks.
The team summarises their conclusions from the empirical study as:
- We demonstrated generalization to varying numbers of arms in bandit experiments (unseen action spaces), permuted observations and actions with no degradation in performance (unseen observation spaces).
- We observed the tendency of the meta-learned RL algorithm to learn about states and their associated rewards at meta-test time (unseen tasks).
- We showed that the discovered learning behaviour also transfers between grid world and (unseen) classic control environments.
The paper Introducing Symmetries to Black Box Meta Reinforcement Learning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.