Large pretrained language models (LLMs) have emerged as the state-of-the-art deep learning architecture across a wide range of applications and have demonstrated impressive few-shot learning capabilities when transferred to new tasks. These models however generally require a fine-tuning process that entails costly additional training on specialized data.
In the new paper Fine-Tuning Language Models via Epistemic Neural Networks, a DeepMind research team modifies large language models (LLMs) to create an Epistemic Neural Network (ENN). The novel approach achieves model performance comparable to that obtained with traditional fine-tuning while requiring 50 percent less data.
The proposed ENN is built by adding an epinet (Osband et al., 2021) to the LLM. An epinet is a small network architecture introduced by DeepMind that can be trained with modest incremental computation to estimate uncertainty.
The team summarizes their main contributions as follows:
- We show that LLMs can be augmented with an epinet so that they know what they don’t know.
- We show that prioritizing based on model uncertainty performs better than other heuristic approaches to active learning.
- Finally, the epinet is at least as effective as existing approaches to Bayesian deep learning.
An issue with LLMs is that they cannot distinguish irreducible uncertainty over the next token. While current approaches for addressing this typically involve adding more training data, the team takes a different tack: leveraging the epinet’s uncertainty estimations to help models “know what they don’t know” and improve their data efficiency.
The team also considers active learning schemes that use a priority function to prioritize training labels. Specifically, the priority function first maps ENN parameters representing the base network and the epinet to a score, which the ENN then uses to prioritize uncertain data.
In their empirical study, the team compared their approach against baselines on GLUE (General Language Understanding Evaluation)tasks. The results show that, on average, learning with an epinet on a BERT (Bidirectional Encoder Representations from Transformers) model achieves performance comparable to the baselines on the MNLI matched dataset (Williams et al., 2017) while requiring 50 percent less data.
Overall, this work introduces a promising solution to the problem of active learning in fine-tuning language models by leveraging Epistemic Neural Networks. The team hopes this research direction can lay the foundation for further improvements in data efficiency.
The paper Fine-Tuning Language Models via Epistemic Neural Networks is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.