The fine-tuning of pretrained large language models (LMs) has enabled exceptional successes on classification tasks via direct language generation, where custom prompts are selected to elicit knowledge from models. Prompts can convert a task into a language model format, providing a lower bound on what the model “knows,” and are thus effective in low data regimes.
In the paper How Many Data Points is a Prompt Worth?, a Hugging Face research team shows that prompting is indeed beneficial for fine-tuning pretrained language models, and that this benefit can be quantified as some hundreds of data points on average across classification tasks.
There are two common transfer learning settings for fine-tuning pretrained models for classification: head-based and prompt-based. This paper focuses on the prompt-based model, with the researchers decomposing a prompt into a pattern and a verbalizer. The pattern turns the input text into a cloze task (i.e., a fill-in-the-blank problem with a masked token or tokens that need to be filled), and the masked token prediction is mapped to a verbalizer that produces a class: either true or false.
The researchers limit themselves to human-written prompts, and the process of fine-tuning involves training the model to produce the correct verbalization. The loss function used is the cross-entropy loss between the correct answer and the distribution of probabilities amongst the tokens in the verbalizer.
The team conducted experiments to discover whether prompting itself adds information to the supervised task, deducing in this way the answer to the question of how many data points a prompt is worth. The experiments were performed on the SuperGLUE and MNLI benchmark datasets, which comprise a variety of tasks, including entailment, multiple-choice question answering, commonsense reasoning, etc.
The researchers compared head- and prompt-based fine-tuning with the best performing pattern on each task. They compared the models across a scale starting from 10 data points and increasing exponentially to the full dataset. Plotting the prompting advantage enabled the researchers to quantify how many data points a prompt is worth. The results show that prompting achieves a substantial advantage in terms of data efficiency on most tasks.
The team also tested elements that might affect data efficiency, including the verbalizer and the choice of prompts and metrics. They found that prompting yields data efficiency even if it is not directly analogous to the generation process of training, that prompt choice is not a dominant hyperparameter, and that metrics are not sensitive to prompting.
The researchers provide practical evidence that prompting consistently yields improvements across various tasks, is mostly robust to pattern choice, and can even learn without an informative verbalizer.
The paper How Many Data Points is a Prompt Worth? is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.