Studies have shown that large-scale pretrained language models (LMs) can obtain surprisingly effective results in zero-shot and few-shot learning scenarios. But in a recent paper, researchers from New York University, Facebook AI, and a CIFAR Fellow in Learning in Machines & Brains show how held-out examples are often used to tune various aspects of these models’ learning, including hyperparameters, training objectives and natural language templates. What if such held-out examples were unavailable?
The team initiates a re-thinking of the few-shot learning concept, evaluating LMs’ few-shot learning abilities when only the provided examples for model selection are used — a setup they characterize as “true few-shot learning.”
Typically, researchers improve the performance of deep learning models by feeding them as much data as they can. Few-shot learning however aims to build high-performance machine learning models even when training data is scarce. There are two main approaches commonly used to achieve this.
The first is a data-level approach, the idea being that if there is insufficient data to fit the parameters, then more external data sources should be added. For instance, to detect a specific species of dog in images, researchers could add images of other dog species to the training dataset.
The second way to counter data scarcity is a parameter-level approach. Because insufficient training data can lead to high-dimensional spaces in few-shot learning, the model is trained to either generalize from the limited number of training samples or enhance its performance by moving to the extensive parameter space. As such, the algorithm is taught to search for the route in the parameter space that provides the best results, a process also known as meta-learning.
In a few-shot learning setting, a teacher model can be trained on a huge quantity of data to encapsulate the parameter space. The teacher then directs the student classifier with regard to the extensive parameters to obtain the optimal results. This process can also be regarded as a model selection approach.
In recent studies, large-scale LMs have been shown to be good learners under few-show settings, as they can learn new tasks from only a small number of examples. This model selection approach however relies on large “held-out” sample sets to choose prompts and hyperparameters. In effect, these held-out samples act as a validation set that fine-tunes the learning algorithm.
In consideration of these issues, the team evaluated the ability of tuned few-shot learning methods when no large validation dataset was available. Under their true few-shot learning setting, they aimed for learning models with low expected loss when using only a small training set drawn from a single distribution. They used cross-validation (CV) and minimum description length (MDL) as model selection criteria to evaluate the tuned few-shot models under their true few-shot setting.
The team evaluated the proposed true few-shot setting LMs on the LAMA benchmark. They used test accuracies from CV/MDL-chosen prompts for comparison, and measured accuracies for choosing the best prompt using held-out accuracy (as in prior work), the worst prompt as a lower bound, and random prompts (representing the mean accuracy over all prompts).
The results show that prompts chosen by CV and MDL underperform on the best prompt (using held-out performance), and the CV/MDL-chosen methods struggle with choosing the prompt with the highest test accuracy. Also, as models grow bigger and generalize better, their ability to reliably choose good prompts degrades.
The team concludes that prompt selection performs poorly in a true few-shot setting and that prior research in this area has significantly overestimated the few-show ability of LMs with regard to the challenge of few-shot model selection.
The paper True Few-Shot Learning with Language Models is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Pingback: r/artificial - [R] NYU, Facebook & CIFAR Present ‘True Few-Shot Learning’ for Language Models Whose Few-Shot Ability They Say Is Overestimated - Cyber Bharat