After a week of hard work, the weekend is a fun time for friends to get together for some drinks, some laughs, and maybe a party game? A favourite is the collaborative card game Hanabi, which won the 2013 Spiel des Jahres and 2013 Fairplay À la Carte Award. The game has now drawn the attention of AI researchers.
With the impressive introduction of AlphaStar into the real-time strategy video game StarCraft II and the earlier triumph of AlphaGo over top pros in the ancient board game Go, British artificial intelligent company DeepMind is a leader in AI gameplay. In collaboration with Google Brain, DeepMind has now open sourced its Hanabi Learning Environment (HLE), a platform for evaluating AI performance in the popular game.
Named after the Japanese word for “fireworks”, Hanabi is a cooperative game for two to five players, who must play their cards in a specific order to trigger a simulated pyrotechnics display. Each player’s cards are visible to all other players but not themselves. In turn, players choose to either give information, discard a card, or play a card. Most players follow basic conventions and some have developed advanced strategies such as priority prompts, priority finesses, and bluffs. The game is challenging for AI agents as it is based on imperfect information, limited communication and reasoning, and successful leveraging of theory of mind.
HLE framework researchers trained AI agents using three state-of-the-art reinforcement learning algorithms in both self-play (playing with close friends who share the same conventions) and ad hoc team play (playing with unknown partners) modes.
In the self-play setting, the learning agents Rainbow, Actor-Critic-Hanabi-Agent (ACHA) and Bayesian Action Decoder (BAD) all played considerately well in 2-player game experiments. Yet both Rainbow and ACHA agents performed worse as the numbers of players increased, and were inferior to rule-based, hand-coded agents in multiplayer games.
The authors also picked ten independently trained ACHA agents and one Rainbow agent to examine their performance in ad hoc team play over 1000 games. In contrast to self-play mode, experiment results showed all AI agents encountering difficulties when playing with unfamiliar teammates in both two-player and four-player games, with some ad-hoc teams scoring close to zero points. Unlike humans who can quickly learn from new players and adjust strategies accordingly, the AI agents tended to rigidly rely on and follow their learned skills and strategies.
Despite the difficulties in achieving or eclipsing human performance, the introduction of HLE provides a promising research framework for further improvement of AI Hanabi gameplay. One of the HLE paper authors, DeepMind research scientist Marc G. Bellemare, wrote in his blog, “I look forward to seeing the beautiful cooperation that must emerge from Hanabi research.”
For more information about the work, readers can refer to the paper The Hanabi Challenge: A New Frontier for AI Research on arVix. The HLE code is on GitHub.
Source: Synced China
Localization: Tingting Cao | Editor: Michael Sarazen | Producer: Chain Zhang