Don’t simply “all in” if there’s a bot at your Texas hold’em poker table, because Facebook and Carnegie Mellon University’s new Pluribus AI system just beat five human pros at the same time — including a couple of World Series of Poker Champs.
AI models had already bettered human poker pros one-on-one, but Pluribus’s success in a six-player game signals a huge leap in ability.
Texas hold’em is one of the most popular poker variants that involves game theory, gambling, and strategy. To win the game, each play must assemble the best five cards from any combination of two “hole cards” dealt face down to each player and five community cards dealt face up. Players can choose to check, bet, call, raise, and fold.
Researchers regard poker as a meaningful and complex experimental field where they can explore how AI interacts with gaming theory and imperfect information. The implications of poker research are far-reaching in the real world, for example in preventing fraud, enhancing cybersecurity or blocking harmful content and other scenarios with hidden information, multiple participants or limited communication.
The previous noteworthy milestone in poker research was Libratus, the first AI to defeat top human professionals in a two-player no-limit poker game. Introduced by CMU Professor Tuomas Sandholm, Libratus bettered human world champions by winning US$1,766,250 in chips in a marathon 20-day poker competition held in January 2017.
This time, Facebook and CMU researchers opted to use a multi-player game format. This is a much more challenging environment due the increased difficulty in finding a Nash equilibrium in zero-sum games with more than two players. A Nash equilibrium is a stable state in a game where each of the players makes their best decision. Most research on two-player zero-sum games aims to find an exact Nash equilibrium that enables the AI to become unbeatable.
In a paper published in Science, Facebook and CMU researchers propose that finding an exact Nash equilibrium might not be the optimal solution for a multi-player poker game. Instead, they simply set out to develop a system that would win.
The training process can be divided into two stages: First, researchers trained a blueprint strategy through self-play games using Monte Carlo counterfactual regret minimization (CFR), an iterative self-play algorithm that enables an AI model to self-improve by beating its earlier version. Next, when Pluribus is actually playing against opponents, it conducts real-time searches to find better, finer-grained strategies.
A surprising aspect of the research is the relatively low training and inference cost of Pluribus compared to other game-playing AI like AlphaGo. Researchers trained Pluribus’s blueprint strategy in eight days on a 64-core server for a total of 12,400 CPU core hours, which comes to US$144 on the cloud. For inferencing, Pluribus runs on two Intel Haswell E5-2695 v3 CPUs and uses less than 128 GB of memory — compare that to AlphaGo’s 1,920 CPUs and 280 GPUs for real-time search in its 2016 matches with Go grandmaster Lee Sedol.
Some 10,000 hands were played over the 12-day poker competition used to evaluate the performance of Pluribus. In one set of experiments five human professionals played with one copy of Pluribus (each player remaining anonymous). After applying a variance-reduction technique AIVAT to reduce the luck factor in the game, Pluribus won an average of 48 mbb/game (with a standard error of 25 mbb/game), a high win rate in six-player no-limit Texas hold’em.
“[Pluribus] is an absolute monster bluffer. I would say it’s a much more efficient bluffer than most humans. And that’s what makes it so difficult to play against. You’re always in a situation with a ton of pressure that the AI is putting on you and you know it’s very likely it could be bluffing here,” said one of the pro players, Jason Les.
The other experiment format had one human professional play with five copies of Pluribus, and even without any collaboration the bots still beat the human by an average of 32 mbb/game (with a standard error of 15 mbb/game).
Chris “Jesus” Ferguson, an American professional poker player who has won six World Series of Poker events, fell behind Pluribus by 25 mbb/game. “Pluribus is a very hard opponent to play against. It’s really hard to pin him down on any kind of hand. He’s also very good at making thin value bets on the river. He’s very good at extracting value out of his good hands.”
Today’s top AI Go bots often play a “3-3 stone” early in the game. This is a move that pros would have dismissed only a few years ago, but AI has now shown they were wrong about that. In its games against humans Pluribus also made an interesting choice that contradicted conventional poker wisdom when it engaged in “donk betting,” a technique that most poker pros avoid. Perhaps, like the “3-3 stone,” the practice of “donk betting” will also get a second look thanks to AI.
Read the paper Superhuman AI for multiplayer poker on Science.
Journalist: Tony Peng | Editor: Michael Sarazen