Reinforcement learning based on self-play has enabled AI agents to surpass human expert-level performance on the popular computer game Dota and board games such as chess and Go. Despite the strong performance results, recent studies have suggested that self-play may not be as robust as previously thought. A question naturally arises: Are such self-play agents vulnerable to adversarial attacks?
In the new paper Adversarial Policies Beat Professional-Level Go AIs, a research team from MIT, UC Berkeley, and FAR AI employs a novel adversarial policy to attack the state-of-the-art AI Go system KataGo. The team believes theirs is the first successful end-to-end attack against an AI Go system playing at the level of a human professional.
The team summarizes their main contributions as follows:
- We propose a novel attack method, hybridizing the attack of Gleave et al. (2020) and AlphaZero-style training (Silver et al., 2018).
- We demonstrate the existence of adversarial policies against the state-of-the-art Go AI system, KataGo.
- We find the adversary pursues a simple strategy that fools the victim into predicting victory, causing it to pass prematurely.
This work focuses on exploiting professional-level AI Go policies with a discrete action space. The team attacks the strongest publicly available AI Go system, KataGo, albeit not at its full strength setting. Unlike KataGo, which is trained via self-play games, the team trained their agent on games played against a fixed victim agent, using only data from the turns where it is the adversary’s move. This “victim play” training approach encourages the model to exploit the victim, not mimic it.
The team also introduces two distinct families of Adversarial Monte Carlo tree search (A-MCTS) — Sample (A-MCTS-S) and Recursive (A-MCTS-R) — to avoid having the agent model its opponent’s moves in its own policy network. Rather than using random initialization, the team employs a curriculum that trains the agent against successively stronger versions of the victim.
In their empirical studies, the team used their adversarial policy to attack KataGo without search (the level of a top 100 European player), and 64-visit KataGo (“near superhuman level”). The proposed policy achieved a more than 99 percent win rate without search and a more than 50 percent win rate against 64-visit KataGo.
While this work suggests that learning via self-play is not as robust as expected and that adversarial policies can be used to beat top Go AI systems, the results have been questioned by the machine learning and Go communities. Reddit discussions involving paper authors and KataGo developers have focused on particularities of the Tromp-Taylor scoring system used in the experiments — while the proposed agent gets its wins by “tricking KataGo into ending the game prematurely,” it is argued that this tactic would lead to devastating losses under more common Go rulesets.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.