Deep neural networks are known to learn opaque, uninterpretable representations that lie beyond the grasp of human understanding. As such, from both scientific and practical viewpoints, it is intriguing to explore what is actually being learned and how in the case of superhuman self-taught neural network agents such as AlphaZero.
In the new paper Acquisition of Chess Knowledge in AlphaZero, DeepMind and Google Brain researchers and former World Chess Champion Vladimir Kramnik explore how and to what extent human knowledge is acquired by AlphaZero and how chess concepts are represented in its network. They do this via comprehensive concept probing, behavioural analysis, and examination of AlphaZero’s activations.
The team aims their study at an improved understanding of:
- Encoding of human knowledge.
- Acquisition of knowledge during training.
- Reinterpreting the value function via the encoded chess concepts.
- Comparison of AlphaZero’s evolution to that of human history.
- Evolution of AlphaZero’s candidate move preferences.
- Proof of concept towards unsupervised concept discovery.
The researchers premise their study with the idea that if the representations of strong neural networks like AlphaZero bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability.
The team detects human concepts from network activations on a large dataset of inputs, probing every concept at every block and over many checkpoints during AlphaZero’s chess self-play training process. This enables them to build up a picture of what is learned, when it was learned during training, and where in the network it is computed.
The team examines how chess knowledge is progressively acquired and represented using a sparse linear probing methodology to identify how AlphaZero represents a wide range of human chess concepts. They visualize this acquisition of conceptual knowledge by illustrating what concept is learned when in training time and where in the network in “what-when-where plots.”
Following the study of how internal representations change over time, the team then investigates how these changing representations give rise to changing behaviours by measuring changes in move probability on a curated set of chess positions; and by comparing the evolution during self-play training to the evolution of move choices in top-level human play.
Finally, given the established AlphaZero’s activations used to predict human concepts, the team investigates these activations directly by using non-negative matrix factorization (NMF) to decompose AlphaZero’s representations into multiple factors to obtain a complementary view on what is being computed by the AlphaZero network.
The team’s study of the progression of AlphaZero’s neural network from initialization until the end of training yields the following insights: 1) Many human concepts can be found in the AlphaZero network; 2) A detailed picture of knowledge acquisition during training emerges via the “what-when-where plots”; 3) The use of concepts and relative concept value over time evolves, AlphaZero initially focuses primarily on material, with more complex and subtle concepts emerging as important predictors of the value function only relatively late in training; 4) Comparison to historical human play reveals that there are notable differences in how human play has developed, but also striking similarities with regard to the evolution of AlphaZero’s self-play policy.
The paper Acquisition of Chess Knowledge in AlphaZero is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.