AI Feature United States

From Zero to Master in Hours: AlphaZero Accelerates Reinforcement Learning

Google’s DeepMind has once again surprised the machine learning community, this time with the introduction of AlphaZero — a new algorithm that can quickly surpass human board game performance through reinforcement learning self-play.

Google’s DeepMind has once again surprised the machine learning community, this time with the introduction of AlphaZero — a new algorithm that can quickly surpass human board game performance through reinforcement learning self-play.

It was was just two months that DeepMind published their Nature paper on AlphaGo Zero, which mastered the game of Go in days, starting with no human data other than game rules. AlphaZero also starts tabula rasa, but the new algorithm has taken a leap forward in flexibility — after mere hours of training, and again with no human game records to reference, it beat leading computer chess program Stockfish 8. AlphaZero required even less time to outsmart the top shogi (Japanese chess) bot Elmo, which like Stockfish had already beat human world champions. To remove any doubt about its power and potential AlphaZero also easily dispatched its Go-playing predecessor AlphaGo Zero.

AlphaGo Zero uses deep convolutional neural networks and was trained solely by reinforcement learning from self-play games. AlphaZero is a more generic version of the program, in line with DeepMind’s oft-stated goal of evolving its Go-bots toward other and perhaps more practical real-world tasks.

The AlphaGo Zero and AlphaZero algorithms differ in that the former takes a binary win/loss position whereas the latter considers draws or other game outcomes not seen in Go. AlphaZero’s algorithm is tailored to asymmetrical game boards, while its neural network is updated continually without waiting for the previous iteration to finish.

AlphaZero surpassed Stockfish in four hours (300k steps), Elmo in less than two hours (110k steps), and AlphaGo Lee (which beat Korean Go Master Lee Sedol in 2016) in eight hours (165k steps).

Game boards are structured problems that AI has long sought to tackle. DeepMind’s foray into board games was both a game-changer and a means to an end, namely artificial general intelligence. The key idea is moving from domain-specific to generalized methodologies.

Says DeepMind CEO and former chess prodigy Demis Hassabis, “If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society.”


Journalist: Meghan Han | Editor: Michael Sarazen

0 comments on “From Zero to Master in Hours: AlphaZero Accelerates Reinforcement Learning

Leave a Reply

Your email address will not be published.

%d bloggers like this: