Large-scale pretrained language models have shown a remarkable ability to learn new downstream tasks given only a few relevant examples. This practical and desirable few-shot learning competency is achieved by storing a large amount of information during training and hence tends to require huge model parameter counts. While retrieval-augmented models are lighter and generally better at knowledge-intensive tasks, the question of whether they are suitable for few-shot learning remains relatively unexplored.
In the new paper Few-shot Learning With Retrieval Augmented Language Models, a research team from Meta AI, PSL University, Inria, and University College London presents Atlas, a carefully crafted pretrained retrieval augmented language model that effectively learns new knowledge-intensive tasks under few-shot settings. An 11B parameter Atlas model outperforms the 540B parameter PaLM model to set a new state-of-the-art on QA tasks.
The team summarizes their main contributions as follows:
- A thorough study on how to design and train retrieval-augmented language models, with a focus on downstream few-shot learning and sample efficiency.
- The findings of this study lead to a retrieval-augmented language model, called Atlas, that exhibits few-shot abilities that emerge at lower scale than standard LLM.
- We provide an exploration of fine-tuning strategies to efficiently adapt both the retriever and the language model to the task at hand.
- Thorough downstream experiments in few-shot settings, demonstrating state-of-the-art results on few-shot Natural Questions (+2.8%), TriviaQA (+3.3%), FEVER (+5.1%), and results on par or stronger than models with 15× more parameters on MMLU.
- Experiments investigating full-dataset finetuning, setting new state-of-the-art results in Natural Questions (+8.1%), TriviaQA (+9.3%) and 5 KILT Tasks.
- Experiments demonstrating the updatability and interpretability characteristics of Atlas.
Atlas comprises two sub-models: a retriever and a language model. The retriever is built upon the Contriever (Izacard et al., 2022) dual-encoder architecture and is based on continuous dense embeddings. This dense retriever design enables Atlas to train both query and document encoders without expensive document annotation, thus significantly reducing memory requirements.
The language model is built upon the T5 sequence-to-sequence architecture (Raffel et al., 2019). The team uses a Fusion-in-Decoder modification of sequence-to-sequence models and processes each document independently in the encoder, which facilitates concatenation of the query to each document. Compared to conventional transformers — whose concatenation of the query to all documents results in quadratic complexity with respect to the number of documents — Atlas can much more efficiently scale with the number of documents.
The team conducted empirical evaluations for Atlas on different knowledge-intensive natural language understanding tasks in a few-shot setting.
In the experiments, Atlas achieved 42 percent accuracy on Natural Questions and 84.7 percent on TriviaQA tasks with only 64 training examples — a three-point improvement compared to the 540B parameter PaLM model (Chowdhery et al., 2022) , which also incurred a 50x higher pretraining computation cost. Atlas also set a new state-of-the-art on Natural Questions, TriviaQA, FEVER, and five KILT tasks.
Overall, this work demonstrates that the proposed Atlas has updateability, interpretability and controllability capabilities and provides an efficient approach that enables language models to tackle knowledge-intensive tasks without huge parameter requirements.
The paper Few-shot Learning With Retrieval Augmented Language Models is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “Meet Atlas: A Pretrained Retrieval Augmented Language Model That Outperforms a 540B Parameter Model But Requires 50x Fewer Parameters”