A research team from DeepMind and University College London have introduced Alchemy, an open-source benchmark for meta-RL research.
In recent years, reinforcement learning (RL) has garnered much attention in the field of machine learning. The approach does not require labelled data and has yielded remarkable successes on a wide variety of specific tasks. RL unfortunately continues to struggle with issues such as sample efficiency, generalization, and transfer learning. To address these drawbacks, researchers have been exploring meta-reinforcement learning (meta-RL), in which learning strategies can quickly adapt to novel tasks by using experience gained on a large set of tasks that have a shared structure.
Although a number of interesting and innovative meta-RL techniques have been proposed, there exists no ideal task benchmark for testing new algorithms. Progress in the field can only be sustained if existing work can be reproduced and accurately compared to assess the performance of new methods, and Alchemy is designed to address this.
Meta learning is inspired by humans’ ability to generate and tackle new tasks by drawing on experiences gleaned from other, related learning tasks. It provides a new learning paradigm wherein agents can gain experience over multiple learning episodes – often covering a distribution of related tasks – and use this experience to improve future learning performance. This leads to a variety of benefits as learning strategies improve on both lifetime and evolutionary timescales.
Meta-RL environments thus present the learner not with a single task, but instead with a task distribution, where all the involved tasks share high-level features. There are two ideal features for benchmark meta-RL task distribution: accessible and interesting. Accessible ensures a complete knowledge of the full task distribution, while interesting means the displayed properties are comparable to the structural richness of challenging real-world tasks.
Unfortunately, previous works on meta-RL benchmarks have failed to achieve both, and are either accessible without being interesting (such as bandit tasks), or interesting without being accessible (such as Atari games). Alchemy aims to check both boxes as a “best-of-both-worlds” benchmark for meta-RL research.
Alchemy is a 3D video game played in a series of trials that fit together into episodes. At the beginning of each trial, the agent is presented with a set of stones, containers filled with coloured liquids (potions), and a central cauldron. Gameplay involves using the potions to treat and boost the value of the stones, which are then added to the cauldron to register the maximum possible point value within a fixed time limit.
The value of the stones is tied to their perceptual features (size, color and shape), and the task thus constitutes learning a “chemistry” that governs how different potions affect different stones across trials in an episode. At the start of each new episode however, the stones and the potions’ transformative effects are changed. While this resampling creates many possible chemistries, there are also a number of principles that span all episodes. For example, potions come in fixed pairs (e.g. red/green) that always produce opposite effects. A good meta-learner will identify and exploit such regularities.
Alchemy involves a compositional set of latent causal relationships, and requires strategic experimentation and action sequencing. In addition, the game levels are created based on an explicit generative process, resulting in an accessible structure that is also interesting.
The researchers evaluated the Alchemy environment on two powerful deep RL agents (IMPALA and V-MPO). Although these agents have achieved impressive performances in single-task RL environments, they displayed very poor meta-learning performance in Alchemy even after extensive training. The team says the results reflect a failure of structure learning and latent-state inference involved in meta-learning, validating Alchemy as a useful benchmark task for meta-RL research.
The paper Alchemy: A Structured Task Distribution for Meta-Reinforcement Learning is on arXiv. The code and other resources have been open-sourced and are available on the project GitHub.
Author: Hecate He | Editor: Michael Sarazen