AI Machine Learning & Data Science Research

Google’s Universal Pretraining Framework Unifies Language Learning Paradigms

In the new paper Unifying Language Learning Paradigms, a Google Research/Brain team proposes a framework for pretraining universal language models that are effective across many different tasks. Their 20B parameter model surpasses 175B GPT-3 on the zero-shot SuperGLUE benchmark and triples the performance of T5-XXL on one-shot summarization tasks.

Generalization is one of the primary goals in contemporary machine learning research and is regarded as a pathway to artificial general intelligence. Although today’s pretrained large language models (LMs) continue to push the state-of-the-art in natural language processing (NLP), most such models target specific problem classes and suffer significant performance drops when applied to new tasks. Is it possible to pretrain language models that will work well across many diverse tasks?

A Google Research/Brain team addresses this question in the new paper Unifying Language Learning Paradigms, proposing UL2, a framework for pretraining universal language models that are effective across many different tasks. Their 20B parameter model surpasses the state-of-the-art 175B GPT-3 on the zero-shot SuperGLUE benchmark and triples the performance of T5-XXL on one-shot summarization tasks.

The UL2 framework aims at building a universally applicable language model that is consistently effective across various types of datasets, tasks, and setups. UL2 is driven by Mixture-of-Denoisers (MoD), a novel pretraining objective that integrates diverse pretraining paradigms to enable a single model to maintain strong performance across different tasks.

MoD employs three main paradigms during pretraining: R-Denoiser, a standard denoiser that is good at acquiring knowledge instead of learning to generate fluent text; S-Denoiser, designed for specific denoising cases where a strict sequential order can be observed for framing input-to-target tasks; and X-Denoiser, which is adopted when the model needs to recover a large part of the input but is only given a small moderated part. A novel mode-switching feature enables dynamic mode switching via discrete prompting, such that the model can switch between the R, S and X denoisers on-demand when learning downstream tasks.

In their empirical study, the team conducted extensive experiments on diverse tasks ranging from supervised to prompt-based in-context few-shot learning. In the evaluations, the proposed UL2 outperformed a T5 baseline by 43.6 percent and GPT-like models by 76.1 percent. The team also scaled UL2 to 20B parameters and ran the model on 50+ NLP tasks, where it achieved state-of-the-art performance on a vast majority of the tasks and setups. In zero/few-shot experiments, UL2 surpassed GPT-3 175B on the zero-shot SuperGLUE benchmark.

Flax-based T5X model checkpoints for the 20B UL2 are available on the project’s GitHub. The paper Unifying Language Learning Paradigms is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Google’s Universal Pretraining Framework Unifies Language Learning Paradigms

  1. Pingback: Google's Universal Pretraining Framework Unifies Language Learning Paradigms – Synced - AI Caosuo

Leave a Reply

Your email address will not be published.

%d bloggers like this: