Reliable generalization to out-of-distribution inputs is a crucial feature for developing strong machine learning models. But determining how and why neural networks are able to generalize on algorithmic sequence prediction tasks remains an open question.
In the new paper Neural Networks and the Chomsky Hierarchy, a DeepMind research team conducts an extensive generalization study on neural network architectures that explores whether insights from the theory of computation and the Chomsky hierarchy can predict the practical limits of neural network generalization.

The team summarizes their main contributions as:
- We conduct an extensive generalization study (2200 individual models, 16 tasks) of state-of-the-art neural network architectures (RNN, LSTM, Transformer) and memory-augmented neural networks (Stack-RNN, NDStack-RNN, TapeRNN) on a battery of sequence-prediction tasks spanning all the levels of the Chomsky hierarchy that can be practically tested with finite-time computation.
- We show how increasing amounts of training data do not enable generalization on our tasks higher up in the hierarchy for some architectures (under sufficient capacity to perfectly learn the training data), potentially implying hard limitations for scaling laws.
- We demonstrate how architectures augmented with differentiable structured memory (e.g., with a stack or a tape) can solve tasks higher up the hierarchy.

Many previous works have investigated whether conventional neural network architectures are able to learn a formal language. While these studies have typically focused on single architectures and a limited set of tasks, the DeepMind paper presents an extensive empirical study on a wide range of models with regard to the Chomsky hierarchy.
Named after the influential American linguist and philosopher who developed it, the Chomsky hierarchy is basically a containment hierarchy of formal grammar (unrestricted grammar, context-sensitive grammar, context-free grammar, and regular grammar) that classifies languages based on the type of automaton able to recognize them. By relating different models to the Chomsky hierarchy, it is possible to determine whether they can recognize certain regular languages.
The researchers note that lower-level automata have restrictive memory models and can only solve lower-level problem sets, while atop the hierarchy, Turing machines with infinite memory and unrestricted memory access can solve all computable problems, i.e. are Turing complete.

The paper examines a wide range of neural network architectures and memory-augmented neural networks — transformer, RNN, LSTM, Stack-RNN, NDStack-RNN and Tape-RNN — covering a total of 2200 models applied to 16 sequence-prediction tasks.
The results show that LSTMs and transformers are not Turing complete as they cannot solve simple sequence tasks such as duplicating a string when the sequences are significantly longer than those seen during training. Models interacting with an external memory structures meanwhile can climb the Chomsky hierarchy, indicating this setup as a promising research direction for improving architecture design.
The code is publicly available on the project’s GitHub. The paper Neural Networks and the Chomsky Hierarchy is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Quite interesting news! I’ve been fascinated by design for a long time and have been working in this direction. I really like it because I can cover different areas. In my work, I often use rugby vector for clients with sports-related requests. This stock library always offers many variations for any design.
Hello! I recently tried out Evidens De Beaute The Total Shield SPF 50 50ml and I must say, I’m impressed. It provides great sun protection without feeling greasy and heavy on the skin. The lightweight texture is easy to blend and leaves my skin feeling smooth and hydrated. Highly recommend it https://www.alyaka.com/products/evidens-de-beaute-the-total-shield-spf-50-50ml
As an AI enthusiast, I’m constantly amazed by the strides researchers like DeepMind are making. This article has deepened my appreciation for the intricate relationship between theoretical concepts and real-world applications in AI. For those interested in diving deeper into the world of AI and exploring practical implementations, I highly recommend checking out the AI product list on https://www.aiproductslist.com/
thank you