Fine-tuning large-scale pretrained transformers enables them to adapt to and perform better on downstream tasks. While this fine-tuning is crucial for countless real-world applications, fully fine-tuning all model parameters becomes increasingly challenging as models scale to ever-increasing sizes. This has led to the development of parameter-efficient transfer learning (PETL) techniques, which can leverage smaller, task-specific models to selectively update only a subset of the model parameters, resulting in faster training and improved efficiency.
Existing PETL approaches however can struggle with inference latency issues, and their selective fine-tuning approach still requires significant energy and computational resources.
In the new paper READ: Recurrent Adaptation of Large Transformers, a Meta AI research team proposes REcurrent ADaption (READ), a memory-efficient approach that adds a lightweight side network to a pretrained model to achieve a 56 percent reduction in memory consumption and an 84 percent reduction in GPU use relative to full-tuning.

The team summarizes their main contributions as follows:
- We overcome the limitations of PETL and side-tuning methods by proposing REcurrent ADaptation (READ), a simple yet effective side-tuning design that requires no pretraining of the side network — a prerequisite of prior side-tuning techniques.
- We conduct thorough experiments on various NLP benchmarks, showcasing the strong performance and high efficiency of READ.
- We demonstrate that READ is a highly scalable solution to fine-tune large transformers and is independent of the backbone model size.
- We provide a theoretical justification for how READ utilizes the backbone hidden state to perform side-tuning.

To overcome PETL’s limitations, READ inserts a small recurrent neural network (RNN) alongside the backbone model and a “joiner” network that merges multiple sources of information to generate inputs for the RNN. The approach first runs a forward pass through the transformer backbone independently from READ, caching necessary intermediate results at each transformer layer. The RNN hidden states at the encoder and decoder are then iteratively computed. Finally, the RNN and backbone outputs are combined to get the new final state.
The proposed fine-tuning process thus does not require an attention mechanism, relying instead on only RNNs and feed-forward networks (FFNs). This improves usability and training efficiency, as pretraining and pruning are avoided. Moreover, the recurrent nature of READ means that the trainable parameters will not increase with backbone layers, further reducing computational costs.


In their empirical study, the team compared READ with full-tuning approaches and baseline PETL methods such as BitFit, Prompt-tuning and LoRA on the GLUE and other benchmarks. In the experiments, READ achieved accuracy competitive with full fine-tuning while reducing energy consumption by 84 percent and model training memory cost by 56 percent.
This work demonstrates READ’s ability to significantly improve transformers’ fine-tuning efficiency. The team hopes READ can make fine-tuning today’s large models more accessible for researchers and enable easier model adaptation to downstream tasks and applications.
The paper READ: Recurrent Adaptation of Large Transformers on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Longing for dependable sources of electronic music downloads? Your quest concludes here! After an exhilarating birthday celebration, I encountered an extraordinary DJ who shared a website that has evolved into my go-to destination for extraordinary music. Explore the extensive lineup of tracks and an exclusive releases showcase at https://volumo.com/. Initiate your musical voyage today and cultivate an abiding affection for harmonious tunes!