Instruction fine-tuning approaches — fine-tuning large language models on tasks described via instructions — have shown promising results in improving zero- and few-shot learning performance through fine-tuning objectives, task sampling strategies, instruction-tuning benchmarks, training datasets and so on. Yet there is limited understanding with regard to the performance trade-offs involved in the various choices made during the instruction-tuning process.
In the new paper OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization, a Meta AI research team presents OPT-IML Bench, a large benchmark for Instruction Meta Learning (IML). OPT-IML includes 2,000 natural language processing (NLP) tasks from existing benchmarks and an evaluation framework for model generalization and is designed to identify and characterize the trade-offs involved in different instruction tuning approaches for large pretrained language models (LLMs).
The team first notes that combining multiple NLP meta-datasets with a general recommendation protocol for scaling up the number of tasks contributes to the success of instruction-tuning approaches. They follow this strategy, consolidating eight meta-datasets into a collection of 1,991 NLP tasks with prompts and classifying them into 100 categories — e.g. Question Answering, Sentiment Analysis, etc. — to create their OPT-IML Benchmark. The benchmark targets massive instruction fine-tuning and diverse task category evaluation to characterize the effects of different approaches and extreme task scaling on instruction tuning.
The researchers then use their OPT-IML Benchmark to fine-tune Open Pretrained Transformer models (Zhang et al., 2022a) similarly to the pretraining, i.e. to predict the next word by conditioning all previous tokens as context. Their novel approach separates the training sequence into a source context sequence and a target sequence and employs document attention masking to utilize the maximum sequence length for computational efficiency.
The team evaluated 30B and 175B parameter OPT-IML models on 14 standard NLP tasks in their empirical study. The results show OPT-IML improves OPT model performance by approximately 7 percent on a zero-shot setting and significantly improves 32-shot accuracy on tasks such as RTE (Recognizing Textual Entailment), WSC (Winograd Schema Challenge), and BoolQ (question answering).
The paper OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.