Today’s transformer-based large language models (LLMs) have proven a game-changer in natural language processing, achieving state-of-the-art performance on reading comprehension, question answering and common sense reasoning benchmarks. Given a prompt, LLMs can also generate coherent and sensible completions — but they struggle with infilling, where they are tasked with generating text at a specific location conditioned on both a prefix and a suffix.
In the new paper Efficient Training of Language Models to Fill in the Middle, an OpenAI research team shows that causal decoder-based autoregressive (AR) LLMs can learn to infill texts via the application of a very simple and straightforward transformation to the training data and without any architectural modifications.
The researchers’ goal was to equip causal decoder-based language models with a “fill-in-the-middle” (FIM) capability and enable them to learn effective text infilling without harming their normal left-to-right generative performance.
The team summarizes their main contributions as follows:
- FIM-for-free property: We perform an extensive scaling study by training a suite of 8 models, with and without FIM, and show that FIM can be learned without compromising the left-to-right capability in pretraining. We examine this claim in both code and language, using both perplexity and sampling-based benchmarks.
- Best practices for FIM in pretraining: We clarify the effects of many hyperparameters related to training FIM models using comprehensive ablations. In particular, we study the FIM rate (the probability at which FIM transformation is applied to the data), different variants of FIM transformation, and the choice of middle span.
- Finetuning inefficiency: An alternative to training FIM models from scratch is to learn this capability by finetuning existing language models.
- New infilling benchmarks: We create two new benchmarks called random span infilling and random span infilling light.
- Need for sampling evaluations: We find that changing various hyperparameters in FIM training often leads to negligible differences in FIM test losses but large differences in sampling-based benchmarks. Not only are sampling benchmarks closer to real use cases, but they also appear to be able to tease apart gains that can be missed using test losses.
The team’s training data transformation strategy is remarkably simple — they move a random span of text from the middle of a document to its end, i.e. document (prefix, middle, suffix) → (prefix, suffix, middle).
The team demonstrates that by jointly training models on a mixture FIM-transformed data and traditional left-to-right data on multiple objectives and datasets, a causal AR LLM can learn to fill in the middle of a document and handle related tasks such as inferring import modules, writing docstrings and completing functions.
The team compared FIM models with conventional approaches in their empirical study. The results show that FIM models can achieve the same test loss as left-to-right models with less computation and that the proposed FIM model pretraining approach is more efficient than conventional finetuning.
Overall, this work demonstrates that FIM models can maintain the same left-to-right text capability as regular AR models while learning how to more effectively fill in the middle — an “FIM-for-free” benefit of the proposed training data transformation strategy. The researchers suggest AR LLMs could be trained with FIM by default in the future.
The paper Efficient Training of Language Models to Fill in the Middle is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.