Increased compute and talent have made reinforcement learning (RL) research a hot field in machine learning — where it has been used to solve problems in self-driving vehicles, robotics, drug discovery and more. But finding a way to reproduce existing work and accurately evaluate iteration improvements remains a difficult challenge in RL.
In an effort to sustain RL’s momentum, a team of researchers from Machine Zone, Google Brain, and California Institute of Technology have introduced a new software framework and benchmark for reproducible reinforcement learning research.
“SLM Lab” is a modular deep reinforcement learning framework in PyTorch. The researchers explain that when two RL algorithms only have small differences, running a standalone implementation of each algorithm then comparing relative performance can produce unclear performance analysis. They elected therefore to implement RL algorithms in SLM Lab modularly — so the differences in performance could be confidently ascribed to the differences between algorithms, not between implementations.
The team also suggests the modular code may be beneficial for research purposes, since it makes the implementation of new RL algorithms easier. Modularity is at the very heart of SLM Lab, whose RL algorithms are defined in three base classes:
- Algorithm: Handles interaction with the environment, implements an action policy, computes the algorithm-specific loss functions, and runs the training step.
- Net: Implements the deep networks that serve as the function approximators for an Algorithm.
- Memory: Provides the data storage and retrieval necessary for training.
Just as implementation can cause significant performance differences in RL algorithms, so can other factors such as environment and hyperparameter settings. To help users better understand the various settings and performance differences, the team organized experiments in a structured “Session-Trial-Experiment” sequence. In SLM Lab, a single run of an algorithm on an environment is a “session, while a collection of sessions comprise a trial. An experiment is a collection of trials with various algorithms and environments. The team also specified every configurable hyperparameter for an algorithm in a spec file.
The team tested the algorithms on 62 Atari games, 11 Roboschool environments via OpenAI gym, and 4 Unity environments. Every 10k or 1k training frames, agents in the environments are checkpointed. The results represent the score per episode after each training averaged over the previous 100 training checkpoints. Researchers explain this measurement is more suitable for showing average performance than tracking drastic performance changes.
Experiments on DQN and DDQN + PER algorithms in Atari games yielded mixed performance results, while the PPO and A2C algorithm results were similar to previous work done by OpenAI. The strength of SAC algorithm over PPO on continuous control problems was also confirmed in the experiments. The researchers point out that computational constraints could be a factor contributing to different results.
Moving forward, as RL continues its rapid evolution and researchers implement new algorithms and publish new results, SLM Lab offers the RL research community a useful new tool for examining algorithms and reproducibility.
Synced previously reported on some related research — DeepMind’s Bsuite — a set of experiments designed to assess the core capabilities of RL agents and help researchers better understand their pros and cons across various applications. The paper Behaviour Suite for Reinforcement Learning (Bsuite) uses clear, informative, and scalable problems to study core issues across different learning algorithms by observing RL agent behaviors on benchmarks.
The paper SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning is on arXiv. SLM Lab is can be installed from GitHub.
Journalist: Fangyu Cai | Editor: Michael Sarazen