Another video game has succumbed to the strength of artificial intelligence. Uber researchers announced yesterday that their AI has completely solved Atari’s Montezuma’s Revenge, a classic game that involves moving a character from one room to another while killing enemies and collecting jewels in a 16th century Aztec-like pyramid.
AI game research took off in 2013 when DeepMind proposed using Deep Reinforcement Learning (more specifically Deep Q Learning) to train computers to play video games. DQN achieved groundbreaking results in seven Atari 2600 games, but failed miserably with Montezuma’s Revenge, achieving zero percent of the average human score (4.7K).
Montezuma’s Revenge is a difficult task in part due its infrequent or deceptive feedback on how short-term actions and rewards affect long-term success. Normal trial-and-error technique like DQN are ineffective in this environment.
Significant progress was made in 2016 when DeepMind researchers proposed count-based exploration algorithms, enabling AI to find 15 rooms and achieve an average score of 3.7K in the game. This October, OpenAI introduced Random Network Distillation (RND), which found all 24 rooms and achieved a mean score of 10k, outperforming human gamers on Montezuma’s Revenge for the first time.
Uber’s new AI algorithm “Go-Explore” scores over 400k, advancing the state-of-the-art performance on Montezuma’s Revenge by two orders of magnitude. The algorithm learns new tasks in two simple steps: it solves the game with exploration, then adds robustness to the solution.
Here are some fast takeaways from the Uber blog:
- Go-Explore builds up an archive of interestingly different game states (which we call “cells”) and trajectories that lead to them.
- To be tractable in high-dimensional state spaces like Atari, Go-Explore needs a lower-dimensional cell representation with which to form its archive…the most naive possible cell representation worked pretty well: simply downsampling the current game frame.
- By explicitly storing a variety of stepping stones in an archive, Go-Explore remembers and returns to promising areas for exploration.
- If the solutions found are not robust to noise (as is the case with our Atari trajectories), robustify them into a deep neural network with an imitation learning algorithm.
- Go-Explore provides the opportunity to leverage domain knowledge in the cell representation
In another first, the Go-Explore algorithm surpassed average human performance in Pitfall, an Atari game that involves collecting jungle treasures while avoiding pitfalls. Go-Explore scored over 21,000; no previous learning algorithm had scored above zero.
More information can be found at Eng.Uber.
Journalist: Tony Peng | Editor: Michael Sarazen