DeepMind’s breakthroughs in recent years are well documented, and the UK AI company has repeatedly stressed that mastering Go, StarCraft, etc. were not ends in themselves but rather steps toward artificial general intelligence (AGI). DeepMind’s latest achievement stays on path: Agent57 is the ultimate gamer, the first deep reinforcement learning (RL) agent to top human baseline scores on all games in the Atari57 test set.
Video games have become a popular testing ground for building adaptive algorithms — they contain a rich suite of challenges that demand development of sophisticated strategies, while also providing a clear optimisation indicator in game score. The Arcade Learning Environment (aka Atari57) was proposed in 2012 as an evaluation set of 57 classic Atari video games which pose a broad range of challenges for an RL agent to learn and master.
Researchers thus far have focused on maximizing their agents’ average performance across all the Atari57 games. Although this average performance has significantly improved in recent years, there is much variance in the different games’ difficulty levels, and it remains challenging for agents to master the more challenging games.
In 2012, DeepMind introduced its Deep Q-network (DQN) to tackle the Atari57 challenge, and since then the AI research community has developed many extensions for and alternatives to the DQN. The deep RL agents produced have however consistently struggled in four games: Montezuma’s Revenge, Pitfall, Solaris and Skiing.
Montezuma’s Revenge and Pitfall require extensive environmental exploration, while Solaris and Skiing pose long-term credit assignment problems. In these games agents must collect information over long time scales to get the necessary feedback for learning, and it’s challenging to match the consequences of agents’ actions to the final rewards they receive.
In their blog post announcing the release, DeepMind trumpets Agent57 as the most general Atari57 agent since the benchmark’s inception, the one that finally obtains above human-level performance not only on easy games, but also across the most demanding games.
Agent57 combines DeepMind’s previous exploration agent, Never Give Up, with an adaptive meta-controller that enables it to compute a mixture of long and short term intrinsic motivation to explore and learn a family of policies and select a policy. The meta-controller also allows actors of the agents to choose different trade-offs between near versus long-term performance and rewards, and to choose between exploiting already known states or exploring new states and information.
DeepMind says Agent57 is able to scale with increasing amounts of computation, so the longer it’s trained, the higher its scores will rise. The researchers also note that it currently takes a lot of computation and time to train Agent57, and that the versatile agent’s debut does not mark an end of Atari research — further improvements can be made not only in data efficiency, but also in general performance.
Journalist: Yuan Yuan | Editor: Michael Sarazen