Deep reinforcement learning (RL) is a trending machine learning algorithm that aims at solving complex decision-making tasks at a human or super-human level performance. The successes of these models depend heavily on large neural networks and huge environments samples to learn from it, therefore obtaining human-level sample efficiency is crucial for deep RL training.
In a new paper Bigger, Better, Faster: Human-level Atari with human-level efficiency, a research team from Google DeepMind, Mila and Universite de Montreal presents a value-based RL agent, which they call faster, better, faster (BBF), that achieves super-human performance on the Atari 100K benchmark on single GPU.
The main goal in this work is to address the issue on how to scale neural networks for deep RL when there are no sufficient samples. The proposed BBF is built upon SR-SPR agent (D’Oro et al., 2023), which uses a shrink-and-perturb method that perturbs only 20 percent of the parameters of the convolutional layers towards a random target, while leaving the later layers to be reset to random initialization. BBF perturb 50 percent of the parameters instead, resulting improved performance.
To scale network capacity, the team adopts the Impala-CNN (Espeholt et al., 2018) network and scale each layer in this network by 4 times. They observed that BBF’s continue to perform better as width is increased, whereas SR-SPR peaked at 1-2 times.
Notable, BBF uses a component, an update horizon, to decreases exponentially from 10 to 3, which yields a surprisingly stronger agent then some agents that uses fixed value, such as Rainbow and SR-SPR. The researchers also increase discount factor during learning and apply weight decay strategy to alleviate statistical overfitting issue.
In their empirical study, the team compared BBF agent to baselines RL agents such as SR-SPR, SPR, DrQ (eps) and IRIS on Atari 100K benchmark, BBF outperforms all competitors in terms of both performance and computational cost. Specifically, BBF improves 2 times in performance over SR-SPR while using nearly the same cost; achieves comparable performance to model-based EfficientZero with more then 4 times runtime reduction.
Overall, this work introduces BBF that is able to achieves super-human performance on Atari 100K. The team hopes their work can encourage more future work to push the frontier of sample efficiency in deep RL.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.