Although today’s large pretrained language models (LM) have demonstrated impressive zero-shot capabilities across a wide range of tasks, the performance of “frozen” LMs — whose weights remain unchanged — still trails that of LMs whose weights have been fine-tuned for specific downstream tasks. Because fine-tuning is a compute-heavy process that also degrades model versatility, techniques such as prompt tuning have been proposed for better leveraging frozen LMs, aiming at a single, versatile model with a wide range of functionalities across disparate applications.
A team from AI21 Labs advances this research in their new paper Standing on the Shoulders of Giant Frozen Language Models, proposing three novel methods — input-dependent prompt tuning, frozen readers, and recursive LMs — for learning small neural modules that can specialize a frozen language model to different tasks. This compute-saving approach outperforms conventional frozen-model methods and challenges fine-tuned performance without sacrificing model versatility.
The researchers see their work as a step toward the design of task-specific neural middleware that can take the place of traditional fine-tuning and recent massively multi-tasked tuning approaches, where the key engineering challenge will instead be “finding the best way to stand on the shoulders of giant frozen language models.”

The team summarizes the key advantages of their approach over multi-task fine-tuning methods:
- Non-forgetfulness: Once the original LM is fine-tuned on any multi-task suite, it can suffer from catastrophic forgetfulness on capabilities far enough from these tasks (manifesting, for example, in perplexity degradation). A frozen LM will never suffer forgetfulness since it remains unchanged.
- Extensibility: When attempting to add a new task to a fine-tuned LM, there is no guarantee that performance on the original task suite will be retained, so the model must be retrained on all tasks together. Given the cost of training such models—in some cases, millions of dollars (Sharir et al., 2020)—it is clearly infeasible to do so repeatedly. In contrast, when adding new capabilities as new external components over a frozen backbone, there is no cross-interference between capabilities.
Current methods for leveraging frozen models generally involve training a small number of parameters to achieve optimized performance on specific tasks; and have enabled fine-tuning competitive results on some tasks. The AI21 researchers argue that because existing frozen LM methods are compact, there is room to expand them significantly by designing “more ambitious external scaffolding” to improve performance at a negligible cost relative to a single pass through the huge LM.
The team compared their approach with conventional fine-tuning methods in two challenging settings: 1) Massive multi-tasking, in which a single general-purpose model is asked to simultaneously perform various NLP tasks; and 2) Open-book and closed-book variants of open-domain question answering.

The researchers first introduce an input-dependant prompt tuning (ID-PT) approach for massively multi-tasking frozen LMs. This is a simple yet effective prompt-tuning method for externally tuning a frozen model that enables the prompt to vary substantially across tasks while requiring fewer training parameters.

The team also introduces frozen LMs into their pipeline as readers for the open-domain question answering task. They design an external re-ranking module that can exploit the world knowledge and deduction capabilities of large LMs and condense relevant information from 100+ retrieved documents into the input sequence length of the frozen LM reader. They demonstrate that the frozen LMs can reach or surpass leading fine-tuning approaches on open-domain question answering benchmarks.

The researchers note that although LMs are a powerful resource, existing approaches use them only once per input query, thus overlooking potentially useful information. The proposed method is able to extract and exploit such information by making two consecutive passes through a single frozen LM via textual and neural recursion. In this way, rather than pretraining an enormous model for all inputs, it is possible to improve performance by varying the number of passes through a single frozen model based on an assessment of the input’s difficulty.


In their empirical study, the team compared their model with state-of-the-art fine-tuned LMs such as RAG and FiD-Distill. The results show that the proposed approach is cost-efficient yet retains LMs’ versatility, non-forgetfulness, and extensibility and can match or surpass the performance of these models in challenging domains. Overall, the study validates the potential of freezing large LMs and learning smaller neural modules that specialize them to different tasks to build more complex model architectures and achieve better performance.
The paper Standing on the Shoulders of Giant Frozen Language Models is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “AI21 Labs’ Augmented Frozen Language Models Challenge Conventional Fine-Tuning Approaches Without Sacrificing Versatility”