In July the poker-playing bot Pluribus beat top professionals in a six-player no-limit Texas Hold’Em poker game. Pluribus taught itself from scratch using a form of reinforcement learning (RL) to become the first AI program to defeat elite humans in a poker game with more than two players.
Compared to perfect information games such as Chess or Go, poker presents a number of unique challenges with its concealed cards, bluffing and other human strategies. Now a team of researchers from Texas A&M University and Canada’s Simon Fraser University have open-sourced a toolkit called “RLCard” for applying RL research to card games.
While RL has already produced a number of breakthroughs in goal-oriented tasks and has high potential, it’s not without its drawbacks. An instability in applications with multiple agents for example has slowed RL development in domains with numerous agents, large states and action spaces, and sparse rewards. Multi-player card games are therefore emerging as a good test environment for improving RL.
The RLCard toolkit supports card game environments such as Blackjack, Leduc Hold’em, Dou Dizhu, Mahjong, UNO, etc. to bridge reinforcement learning and imperfect information games. Because not every RL researcher has a game-theory background, the team designed the interfaces to be easy-to-use and the environments to be configurable. Factors such as state representation, action abstraction, reward design, and even the game rules can be adjusted by researchers.
The research team evaluated RLCard using state-of-the-art RL algorithms in RLCard environments, and by the amount of computation resources required to generate game data. They measured performance using the winning rate of the RL agents against random agents and in self-play tournaments. The team applied Deep Q-Network (DQN), Neural Fictitious Self-Play (NFSP), and Counterfactual Regret Minimization (CFR) algorithms to the environments and saw similar results against random agents. Although NFSP was stronger than DQN on most environments, both were highly unstable in larger games such as UNO, Mahjong and Dou Dizhu.
While RLCard is specifically designed to support RL in card games, there are other RL toolkits available, such as the OpenAI Gym introduced by OpenAI, and SC2LE (StarCraft II Learning Environment) introduced by DeepMind and Blizzard.
The first author on the research paper Daochen Zha, a graduate research assistant at Texas A&M University. Zha told Synced he hopes the toolkit can stimulate research that helps improve RL performance not only in card games but also across other domains with multiple agents, large state and action spaces, and sparse rewards.
The paper RLCard: A Toolkit for Reinforcement Learning in Card Games is on arXiv. The open-source toolkit is available on GitHub.
Journalist: Fangyu Cai | Editor: Michael Sarazen