Back in 1993, AI pioneer Jürgen Schmidhuber published the paperA Self-Referential Weight Matrix, which he described as a “thought experiment… intended to make a step towards self-referential machine learning by showing the theoretical possibility of self-referential neural networks whose weight matrices (WMs) can learn to implement and improve their own weight change algorithm.” A lack of subsequent practical studies in this area had however left this potentially impactful meta-learning ability unrealized — until now.
In the new paper A Modern Self-Referential Weight Matrix That Learns to Modify Itself, a research team from The Swiss AI Lab, IDSIA, University of Lugano (USI) & SUPSI, and King Abdullah University of Science and Technology (KAUST) presents a scalable self-referential WM (SRWM) that leverages outer products and the delta update rule to update and improve itself, achieving both practical applicability and impressive performance in game environments.
The proposed model is built upon fast weight programmers (FWPs), a scalable and effective method dating back to the ‘90s that can learn to memorize past data and compute fast weight changes via programming instructions that are additive outer products of self-invented activation patterns, aka keys and values for self-attention. In light of their connection to linear variants of today’s popular transformer architectures, FWPs are now witnessing a revival. Recent studies have advanced conventional FWPs with improved elementary programming instructions or update rules invoked by their slow neural net to reprogram the fast neural net, an approach that has been dubbed the “delta update rule.”
Inspired by these innovations, the proposed SRWM learns to train itself by self-invented key/value pairs and learning rates, invoking sequences of elementary programming instructions based on the outer production and the delta update rule. To enable more efficient computation, the team uses multi-head computation as in traditional transformers. The researchers also note that their SRWM can potentially replace any regular WM.
To evaluate SRWM’s ability to generate effective self-modifications, the team compared it to baseline models such as LSTM and Delta Net on supervised learning tasks under few-shot settings; and tested it on multi-task reinforcement learning (RL) in game environments.
The results show that SRWM outperforms feedforward, LSTM, and Fake SR (SRWM without self-modification) baseline models while achieving competitive performance and demonstrating practical applicability in both supervised few-shot learning and multi-task reinforcement learning settings. The team hopes their encouraging results will inspire further investigations into self-improving neural networks.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.