From Zero to Master in Hours: AlphaZero Accelerates Reinforcement Learning

Google’s DeepMind has once again surprised the machine learning community, this time with the introduction of AlphaZero — a new algorithm that can quickly surpass human board game performance through reinforcement learning self-play.

It was was just two months that DeepMind published their Nature paper on AlphaGo Zero, which mastered the game of Go in days, starting with no human data other than game rules. AlphaZero also starts tabula rasa, but the new algorithm has taken a leap forward in flexibility — after mere hours of training, and again with no human game records to reference, it beat leading computer chess program Stockfish 8. AlphaZero required even less time to outsmart the top shogi (Japanese chess) bot Elmo, which like Stockfish had already beat human world champions. To remove any doubt about its power and potential AlphaZero also easily dispatched its Go-playing predecessor AlphaGo Zero.

AlphaGo Zero uses deep convolutional neural networks and was trained solely by reinforcement learning from self-play games. AlphaZero is a more generic version of the program, in line with DeepMind’s oft-stated goal of evolving its Go-bots toward other and perhaps more practical real-world tasks.

The AlphaGo Zero and AlphaZero algorithms differ in that the former takes a binary win/loss position whereas the latter considers draws or other game outcomes not seen in Go. AlphaZero’s algorithm is tailored to asymmetrical game boards, while its neural network is updated continually without waiting for the previous iteration to finish.

AlphaZero surpassed Stockfish in four hours (300k steps), Elmo in less than two hours (110k steps), and AlphaGo Lee (which beat Korean Go Master Lee Sedol in 2016) in eight hours (165k steps).

Game boards are structured problems that AI has long sought to tackle. DeepMind’s foray into board games was both a game-changer and a means to an end, namely artificial general intelligence. The key idea is moving from domain-specific to generalized methodologies.

Says DeepMind CEO and former chess prodigy Demis Hassabis, “If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society.”

Journalist: Meghan Han | Editor: Michael Sarazen

1 comment on “From Zero to Master in Hours: AlphaZero Accelerates Reinforcement Learning”

Miguel Vance

2026-02-16

Hello, after another long shift at work I was talking with a friend about needing something to help me relax and he told me to check out extreme casino login https://extremejackpot.net/login/ to see how it works, so I gave it a try for a bit. At first I just had a string of small losses that nearly made me log off, but then I took a slightly bigger chance and finally hit a decent win that actually lifted my mood. I also noticed that players from Canada can get special bonuses and promo deals, which makes the experience feel more rewarding. I’d say it’s worth a go when you want to unwind after a busy day.

Loading...

From Zero to Master in Hours: AlphaZero Accelerates Reinforcement Learning

Like this:

1 comment on “From Zero to Master in Hours: AlphaZero Accelerates Reinforcement Learning”

Leave a Reply Cancel reply

Related

Share this:

Like this:

1 comment on “From Zero to Master in Hours: AlphaZero Accelerates Reinforcement Learning”

Leave a Reply Cancel reply

Related