Site icon Synced

Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden

When reading a novel, humans naturally remember relevant plot information even if it was presented many chapters earlier. Although today’s transformer-based language models have made impressive progress in natural language processing, they struggle in this regard, as the compute required for modelling long-term memories grows quadratically with the length of the text and will eventually exceed the model’s finite memory capacity.

To overcome this limitation, a research team from Instituto de Telecomunicações, DeepMind, Institute of Systems and Robotics, Instituto Superior Técnico and Unbabel has proposed “∞-former” (infinite former) — a transformer model equipped with unbounded long-term memory (LTM) that enables it to attend to arbitrarily long contexts.

The team summarizes their study’s contributions as:

  1. Propose the ∞-former, in which we extend the transformer model with a continuous long-term memory. As the attention computational complexity is independent of the context length, the ∞-former is able to model long contexts.
  2. Propose a procedure that allows the model to keep unbounded context in memory.
  3. Introduce sticky memories, a procedure that enforces the persistence of important information in the LTM.
  4. Perform empirical comparisons in a synthetic task, which considers increasingly long sequences, and in language modelling, by training a model from scratch and by fine-tuning a pretrained language model. These experiments show the benefits of using an unbounded memory.

The team extends the vanilla transformer with a continuous LTM to enable their proposed ∞-former to access long-range context. The novel approach employs a continuous space attention framework to attend over the LTM signal, in which key matrix size depends on the number of basis functions instead of the length of the context being attended to. The model’s computation complexity is thus rendered independent of context length, enabling it to attend to arbitrarily long contexts without increasing memory requirements or computation burden.

To evaluate their proposed method, the researchers performed extensive experiments on synthetic task and language modelling tasks, using transformer-XL and the compressive transformer as their baselines.

In the synthetic task experiments, transformerXL achieved slightly better performance than the compressive transformer and ∞-former for short memory length, but its accuracy degraded rapidly when the sequence length was increased. The accuracies for both the compressive transformer and ∞-former meanwhile remained relatively stable. In the language modelling experiments, the ∞-former slightly outperformed the compressive transformer.

The researchers also note ∞-former’s ability to reduce perplexity in a pretrained model such as GPT-2 by helping the model focus on relevant memories.

Overall, the study shows the proposed ∞-former can scale up to long sequences while maintaining high accuracy, and demonstrates the versatility and benefits of unbounded long-term memory, both in model training from scratch and in the fine-tuning of pretrained language models.

The paper ∞-former: Infinite Memory Transformer is on arXiv.

Author: Hecate He | Editor: Michael Sarazen, Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.