DeepMind is a trailblazer in the trending computer vs humans gaming research space. Following milestone victories against human pros on the board game Go and video game StarCraft II, the Google-owned research company has now pitted their new AI system against humans in the first-person shooter multiplayer video game Quake III Arena.
In a paper recently published in Science Magazine, DeepMind reveals it has successfully trained neural networks that achieve human-level performance in Quake’s “Capture the Flag” mode. Moreover, the agents can also self-develop teamwork skills such as following allies and collaboratively protecting their base from attack by imitating the behavior of human players; while learning to avoid less favorable tactics such as continuously following teammates.
DeepMind began developing Quake game agents last year with their “For The Win (FTW)” intelligence model. Rather than training a single agent on gameplay, researchers leveraged FTW to train a group of agents to play together against other bot teams, learning directly from on-screen pixels fed into a convolutional neural network (CNN).
The first author on the associated paper is DeepMind Research Scientist Max Jaderberg, who says a charm of using AI to play games is that one never know what new behavior or tactics it might demonstrate. Intelligent agents rely mainly on self-play, and the key technology is reinforcement learning. DeepMind game agents use a reward mechanism to drive strategies aimed at achieving gameplay goals.
The input data is passed to two recurrent long short-term memory (LSTM) networks which operate independently on fast and slow time scales and are coupled through a variational target to predict game behavior and output motion memory through a simulated game controller.
The FTW model was trained in a 30-agent game environment with the game field randomly selected to prevent agents from forming memory maps. Each agent learned its own reward signal, enabling them to generate corresponding internal targets (such as “capturing the flag”). Researchers used a two-tier process to optimize the agents’ internal reward mechanism and applied reinforcement learning to these rewards to generate a winning strategy.
In a 40-player game with human players and intelligent agents randomly paired (both as teammates and as enemies), the FTW agents proved more proficient than the baseline method and their chance of winning exceeded that of human players: The agents’ Elo (probability of winning) was 1600 compared to 1300 for strong human players and 1050 for average human players. The results are based on 450,000 flag capturing games, equivalent to roughly four years of human gameplay experience per agent.
DeepMind Research Scientist Thore Graepel says the study highlights the potential for multi-agent training to advance the development of AI and may help research for example on human-computer interaction and systems that complement or co-operate with each other.
The paper Human-level performance in 3D multiplayer games with population-based reinforcement learning was published in Science Magazine.
Author: Reina Qi Wan | Editor: Michael Sarazen