In recent years artificial intelligence systems have successfully challenged human players in the ancient board game Go, the card-based betting game Texas Hold’em, and even in complex video game environments such as Dota and StarCraft. Now, a Microsoft Research Asia (MSRA) team has taken on the traditional Chinese tile-based game of chance, bluffs, and strategy, Mahjong.
At the Shanghai World Artificial Intelligence Conference (WAIC) on August 29, Microsoft’s Global Executive Vice President Harry Shum officially introduced MSRA’s Suphx (“Super Phoenix”) as “the most powerful Mahjong AI in history,”
Synced has previously reported on AI efforts in Mahjong, an imperfect information game which, in terms of game theory, is very different from perfect information games such as Chess and Go. Players in Mahjong are not able to see everything that could impact the game’s outcome, and must speculate regarding their opponents’ unseen tiles when choosing a move.
Suphx taught itself the intricacies of Mahjong mostly through real games with human players on Tenhou, a popular global online Mahjong platform based in Japan with more than 300,000 members. This March to June Suphx played more than 5,000 games against human opponents to earn itself a top rank of 10 Dan. (The highest rank, 11 Dan, is only open to human players.) The Suphx stable rank on Tenhou is around 8.7, higher than top human players’ average of 7.4.
AI’s celebrated video game breakthroughs this year were products of comprehensive gameplay ability comprising strategy along with operational and executive skills. Pure intelligence and strategy games like Mahjong present distinct challenges — as Deputy Dean of Microsoft Research Asia Tie-Yan Liu puts it: “Games like Dota are more ‘game’, while games like Mahjong are more ‘AI’”
A related research paper has yet to be published, but MSRA has revealed some properties of the Suphx model on its blog (in Chinese and Japanese) explaining how they approached Mahjong with deep reinforcement learning:
- Self Adaptive Decision Making: In response to the huge state space, Suphx dynamically regulates the diversity of the exploration process, and thus can test different possibilities of the game more efficiently than traditional algorithms can.
- Prior Coach: To address the imperfect information challenge Suphx uses a “Prior Coach” technology that enhances the effect of reinforcement learning. The basic idea is to use some hidden information to guide the training direction of the model in the self-play training phase so that the learning path is closer to the optimal path with perfect information. This forces the AI model to study and understand the visible information more deeply to form an effective basis for decision making.
- Comprehensive Prediction: In response to Mahjong’s complex reward mechanism, the research team used a comprehensive prediction technique to bridge the gap between each game and the final results after a set of eight games. This predictor can understand the different contributions made in each game that impacted the final results, enabling the final reward signal to be reasonably distributed back to each game to guide self-play more directly and effectively, enabling Suphx to learn advanced techniques from a big picture perspective.
Microsoft says it believes the AI algorithms developed in the Suphx project to navigate the “uncertain nature of Mahjong” could also be applied to solve problems characterized by unknown factors and random events in real-world situations.
Author: Mos Zhang | Editor: Micahel Sarazen