Recent studies have shown that large language models (LMs) can learn new tasks through inference alone, without any parameter updates. While such an in-context learning ability has huge potential, the approach remains nascent, performance significantly lags behind supervised fine-tuning approaches, and the results can be volatile.
In the new paper MetaICL: Learning to Learn In Context, a research team from the University of Washington, Facebook AI Research and the Allen Institute for AI proposes Meta-training for In-Context Learning (MetaICL), a new meta-training framework for few-shot learning where an LM is meta-trained to learn in-context, conditioning on training examples to recover the task and make predictions.
In the in-context learning paradigm, an LM learns a new task by simply conditioning on a few training examples and predicting which tokens best complete a test input. Performance however suffers when the target task is very different from language modelling or the LM is not large enough. Moreover, the approach can produce problematic high variance and poor worst-case accuracy.
The proposed MetaICL is a meta-training method for improving in-context learning performance in few-shot settings, and was inspired by recent work on meta-learning and multi-task learning. The key idea is to use a multi-task learning scheme over a large collection of meta-training tasks to train the model to learn how to condition on a small set of training examples and predict the corresponding output.
To test MetaICL performance, the team used a collection of tasks from CROSSFIT and UNIFIEDQA, including text classification, question answering (QA), natural language inference (NLI) and paraphrase detection. They compared MetaICL with baselines PMI 0-shot, PMI In-context, Channel Multi-task 0-shot, Oracle, etc.; and employed Macro-F1 and Accuracy as evaluation metrics for classification tasks and non-classification tasks, respectively.
In the experiments, MetaICL outperformed strong baselines that included in-context learning without meta-training and multi-task learning followed by zero-shot transfer.
The researchers believe their work is particularly significant when dealing with target tasks that have domain shifts from the meta-training tasks. They suggest future MetaICL research and development could include identifying which meta-training tasks are helpful on target tasks and how to better combine human-written instructions with MetaICL.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.