Transformer-based models debuted in 2017 and have come to dominate the natural language processing (NLP) domain. Transformers convert their text inputs into tokens representing words, subwords, punctuation, etc. Their need to potentially attend to every input token with a quadratically scaling attention mechanism however limits transformers’ context windows to a size that cannot accommodate long-form tasks such as book summarization, etc., where inputs can reach hundreds of thousands of tokens.
In the new paper Unlimiformer: Long-Range Transformers With Unlimited Length Input, a Carnegie Mellon University research team presents a general approach for improving model performance by augmenting pretrained encoder-decoder transformers with an external datastore to enable inputs of unbounded length.
Unlimiformer is a retrieval-based method that can be injected into any existing encoder-decoder transformer to enable it to accept inputs of unbounded length. Given a long input sequence, Unlimiformer first encodes overlapping input chunks, retaining only the middle half of the outputs from each chunk to ensure the encodings have sufficient context on both sides. It then constructs a datastore over the hidden states of all input tokens, and the decoder’s cross-attention queries this datastore and attends to the top-k input tokens. This enables retrieving keys from the entire input sequence (instead of truncating) and requires less computation than attending to all input tokens.
In their empirical study, the team applied Unlimiformer to long-document and multi-document summarization tasks, where it demonstrated its ability to summarize even 350k token inputs without any input truncation at test time. Unlimiformer was also leveraged to fine-tune existing pretrained models such as BART and Longformer, enabling them to attend to unlimited inputs without any additional learned weights or code modifications.
The team hopes Unlimiformer’s promising results on downstream sequence-to-sequence generation tasks will lead to further performance improvements in retrieval-augmented large language models, potentially by incorporating structure into the datastore or by retrieving embeddings in chunks.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.