Two years ago, DeepMind’s AlphaGo sealed a 3-0 victory over Go world champion Ke Jie in a high-profile showdown that captured the attention of Go players and AI researchers around the world. An ancient and complex strategy game, Go has become a favourite research environment for testing AI tech. After the win against Ke Jie, DeepMind further evolved AlphaGo into the completely self-taught AlphaZero.
Not to be left behind, Facebook AI Research (FAIR) introduced their own Go bot last year, aiming to reproduce AlphaGo Zero results using their Extensible, Lightweight Framework (ELF) for reinforcement learning research. FAIR recently added new features to ELF OpenGo and has open-sourced the project. The bot meanwhile responded by posting an undefeated 20:0 record against top human Go professionals.
The new release includes an updated model retrained from scratch, a Windows executable version of the bot, and a unique archive containing OpenGo’s analysis of 87,000 human professional Go games dating back to the 18th century. The revisions also provide contemporary Go players easier access to the system as a training aid.
The evolution of DeepMind’s models illustrated deep reinforcement learning’s capabilities and limitations. Researchers greatly simplified the AlphaZero system, wherein new models were developed through self-play. AlphaZero required just eight hours of training to top an AlphaGo Zero model that trained for 72 hours. These advances however came with a computational expense too high for many if not most researchers.
To help their compute-challenged colleagues, FAIR open-sourced their OpenGo code, models, self-play datasets, and auxiliary data. Researchers also published a paper covering the process of retraining ELF OpenGo from scratch.
“After running our AlphaZero style training software on 2,000 GPUs for 9 days, our 20-block model has achieved super-human performance.” The training dataset features 20 million self-play games and 1,500 intermediate models used to generate them.
Because human benchmarking is vital for evaluating the strength of a model, FAIR pitted an early prototype trained in April 2018 against human professionals. “ELF OpenGo plays under 50 seconds per move (≈80,000 rollouts), with no pondering during the opponent’s turn, while the humans play under no time limit. These evaluation games typically last for 3-4 hours, with the longest game lasting over 6 hours. Using the prototype model, ELF OpenGo won every game for a final record of 20:0.”
OpenGo’s success against professionals reinforced what AlphaZero had shown: that many professionally-played Go moves are less than ideal. Go bots have since changed the way professionals approach the game at a fundamental level. “I can definitely say that the ELF OpenGo project has brought a huge impact on the Korean Go community,” says the Korea Baduk Association’s Beomgeun Cho. “Since it came out, almost every competitive professional player in Korea has been using the ELF Go program to analyze their own and other players’ games. And because of that, not only has the level of Korean Go improved, but the level of the whole world has been improved significantly.”
Although Ha was correct in assuming that many in the research and Go communities would love to see a showdown between two top Go bots, both DeepMind and FAIR have set their sights far beyond the game’s 19×19 playing field. DeepMind sees the self-taught AlphaZero as an important step on its research road to an artificial general intelligence; while FAIR’s project focus is on developing an AI that can learn as efficiently as humans can.
The paper ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen