Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden

When reading a novel, humans naturally remember relevant plot information even if it was presented many chapters earlier. Although today’s transformer-based language models have made impressive progress in natural language processing, they struggle in this regard, as the compute required for modelling long-term memories grows quadratically with the length of the text and will eventually exceed the model’s finite memory capacity.

To overcome this limitation, a research team from Instituto de Telecomunicações, DeepMind, Institute of Systems and Robotics, Instituto Superior Técnico and Unbabel has proposed “∞-former” (infinite former) — a transformer model equipped with unbounded long-term memory (LTM) that enables it to attend to arbitrarily long contexts.

The team summarizes their study’s contributions as:

Propose the ∞-former, in which we extend the transformer model with a continuous long-term memory. As the attention computational complexity is independent of the context length, the ∞-former is able to model long contexts.
Propose a procedure that allows the model to keep unbounded context in memory.
Introduce sticky memories, a procedure that enforces the persistence of important information in the LTM.
Perform empirical comparisons in a synthetic task, which considers increasingly long sequences, and in language modelling, by training a model from scratch and by fine-tuning a pretrained language model. These experiments show the benefits of using an unbounded memory.

The team extends the vanilla transformer with a continuous LTM to enable their proposed ∞-former to access long-range context. The novel approach employs a continuous space attention framework to attend over the LTM signal, in which key matrix size depends on the number of basis functions instead of the length of the context being attended to. The model’s computation complexity is thus rendered independent of context length, enabling it to attend to arbitrarily long contexts without increasing memory requirements or computation burden.

To evaluate their proposed method, the researchers performed extensive experiments on synthetic task and language modelling tasks, using transformer-XL and the compressive transformer as their baselines.

In the synthetic task experiments, transformerXL achieved slightly better performance than the compressive transformer and ∞-former for short memory length, but its accuracy degraded rapidly when the sequence length was increased. The accuracies for both the compressive transformer and ∞-former meanwhile remained relatively stable. In the language modelling experiments, the ∞-former slightly outperformed the compressive transformer.

The researchers also note ∞-former’s ability to reduce perplexity in a pretrained model such as GPT-2 by helping the model focus on relevant memories.

Overall, the study shows the proposed ∞-former can scale up to long sequences while maintaining high accuracy, and demonstrates the versatility and benefits of unbounded long-term memory, both in model training from scratch and in the fine-tuning of pretrained language models.

The paper ∞-former: Infinite Memory Transformer is on arXiv.

Author: Hecate He | Editor: Michael Sarazen, Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

3 comments on “Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden”

Pingback: r/artificial - [R] Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden - Cyber Bharat
Venture Games

2023-04-27

Venture Games feature a large collection of new and Used PS5 Games to help you get the most out of your gaming experience. Please take advantage of fantastic deals and savings, with the option of picking up your item in-store or having it delivered anywhere in the country.

Loading...

Jenny

2025-12-22

With Maine EZ Pass login, you’re set to conquer tolls like a pro! I’ll never forget my days scrambling for coins at toll booths—such a hassle until I embraced the EZ Pass system E-ZPassME

Loading...

Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden

Like this:

3 comments on “Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden”

Leave a Reply Cancel reply

Related

Share this:

Like this:

3 comments on “Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden”

Leave a Reply Cancel reply

Related