Large-scale pretrained language models have achieved state-of-the-art results on many natural language processing (NLP) benchmarks, but these data-hungry models tend to struggle in few-shot learning settings, where only limited training data is available.
To address this issue, a team from the University of Massachusetts Amherst and Google Research has proposed Self-Training with Task Augmentation (STraTA), a novel approach that combines task augmentation and self-training to leverage unlabelled data and improve sample efficiency and model performance on NLP tasks.
The team summarises their work’s main contributions as:
- We propose task augmentation, a novel data augmentation-based fine-tuning method, and show its effectiveness in comparison to other competing fine-tuning approaches.
- We propose a simple yet effective self-training algorithm and highlight important ingredients for successful self-training, which we hope will enable the wider adoption of self-training in NLP.
- With STraTA, we demonstrate the effectiveness of combining task augmentation and self-training in improving sample efficiency across NLP benchmarks.
The team first introduces a framework for task augmentation, where the basic idea is to fine-tune a pretrained language model on an auxiliary task before applying it to the target task. Previous task augmentation approaches have often been hampered by mismatches between the auxiliary and target tasks. The proposed method addresses this limitation by fine-tuning a pretrained generative language model and using it to synthesize in-domain training data for the auxiliary task to boost model performance on the target task.
The researchers use natural language inference (NLI) as their auxiliary task, and fine-tune a pretrained Google T5- 3B model on the MNLI dataset to obtain an NLI data generator that aims to produce augmented examples for all target datasets. The advantages to this approach are that the training labels are free; and, via overgeneration, a large amount of in-domain NLI training data can be produced even for target tasks with small datasets.
While task augmentation uses unlabelled texts to produce synthetic data for an intermediate task, self-training serves as a complementary approach designed to improve a model by training directly on the target task using pseudo-labelled examples. The researchers thus leverage a strong base model and enable it to learn from all available pseudo-labelled examples at every iteration, experimenting with calibration methods such as temperature scaling (Guo et al., 2017), label smoothing (Müller et al., 2019), and confidence penalties to deal with the overconfidence and poor-calibration problems of state-of-the-art language models.
The researchers conducted experiments across 12 NLP datasets and three data regimes (including few-shot settings) to compare STraTA against common fine-tuning baselines such as LMFT & ITFTMNLI.
The evaluation results show that task augmentation significantly improves results on downstream tasks, adding self-training further boosts downstream performance when task-specific unlabelled examples are available, and that using a better base model leads to better self-training results.
Overall, the study demonstrates that the proposed STraTA approach can substantially improve sample efficiency across NLP benchmark datasets, indicating that its combination of task augmentation and self-training is effective for boosting few-shot learning performance.
The paper STraTA: Self-Training with Task Augmentation for Better Few-Shot Learning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.