AI Machine Learning & Data Science Research

Yoshua Bengio Team’s Recurrent Independent Mechanisms Endow RL Agents With Out-of-Distribution Adaptation and Generalization Abilities

A research team from the University of Montreal and Max Planck Institute for Intelligent Systems constructs a reinforcement learning agent whose knowledge and reward function can be reused across tasks, along with an attention mechanism that dynamically selects unchangeable knowledge pieces to enable out-of-distribution adaptation and generalization.

One of the interesting and enduring challenges in machine learning (ML) is improving model capabilities for out-of-distribution adaptation and generalization. While this is an easy task for humans, who can adapt and learn new knowledge quickly by reusing relevant prior knowledge, investing an agent with such abilities requires figuring out how to separate knowledge into easily re-composable modules and how to modify or combine these modules to achieve efficient adaptation to new tasks or changes in data distribution.

To this end, a research team from the University of Montreal and Max Planck Institute for Intelligent Systems that includes Turing Award winner Yoshua Bengio recently proposed a modular architecture comprising a set of independent modules which compete with each other to attend to an input and sparsely communicate using a key-value attention mechanism. The researchers adopt a meta-learning approach on the modules and attention mechanism parameters to achieve fast adaptation to changes in distribution or new tasks in reinforcement learning (RL) agents.

image.png

The team studies whether such a modular architecture could help decompose knowledge into unchangeable and reusable pieces, such that the resulting model is not only more sample-efficient, but also generalizes across various task distributions.

image.png

The proposed model is based on a recurrent independent mechanisms (RIMs) architecture which contains a set of independent and competing modules. In this setup, each module acts independently and interacts with other modules sparingly through attention. The different modules attend to different parts of the input via input attention, while contextual connections among the modules are built via communication attention.

image.png
image.png

The researchers demonstrate how to capture the quickly vs. slowly changing aspects of an underlying distribution by leveraging meta-learning to train different components of the network at different paces on different time scales. The proposed model thus has both fast learning and slower learning phases. In fast learning, the activated module parameters are updated quickly to capture changes in a task distribution. In slow learning, the parameters of the two sets of attention mechanisms are updated less frequently in order to capture the more stable aspects of a task distribution.

The team evaluated their proposed Meta-RIMs networks in a large variety of environments from the MiniGrid and BabyAI suite. They chose mean reward and average success rate as their metrics and compared the Meta-RIMs networks with two baselines: a Vanilla LSTM model and a modular network.

image.png
image.png
image.png

The results show that the proposed method can improve sample efficiency and lead to policies that generalize better to systematic changes in the training distribution. Further, the approach enables faster adaptation to new distributions and a better curriculum learning regime for training RL agents in an incremental fashion by reusing knowledge from similar, previously learned tasks.

The study successfully leverages meta-learning on modular architectures with sparse communication to capture short-term vs long-term aspects of underlying mechanisms, confirming that meta-learning and attention-based modularization can lead to better sample efficiency, out-of-distribution generalization and transfer learning.


The paper Fast and Slow Learning of Recurrent Independent Mechanisms is on arXiv.


Author: Hecate He | Editor: Michael Sarazen, Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

2 comments on “Yoshua Bengio Team’s Recurrent Independent Mechanisms Endow RL Agents With Out-of-Distribution Adaptation and Generalization Abilities

%d bloggers like this: