In April 2022, Google unveiled its 540 billion parameter Pathways Language Model (PaLM), which they developed using a novel Pathways (Barham et al., 2022) approach that enables efficient model training across multiple TPU v4 Pods (in PaLM’s case, 6144 TPU v4 chips). With large language models (LLMs) now receiving unprecedented public attention and finding countless real-world applications, Google introduced its next-generation PaLM2 model family at their I/O developer conference earlier this month.
The paper PaLM 2 Technical Report details the modelling advances, data improvements, and scaling insights that have enabled PaLM2 to achieve state-of-the-art performance. The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model — but uses more training compute to substantially improve performance on natural language generation, translation, and reasoning tasks.
The paper summarizes PaLM 2’s research advances as follows:
- Compute-optimal scaling. The Google team proposes that data size is at least as important as model size and that the two should be scaled roughly 1:1 to achieve the best performance for a given amount of training compute (unlike common approaches that scale models up to 3x faster than datasets).
- Improved dataset mixtures. The team designs a more multilingual and diverse pretraining mixture that extends across hundreds of languages and domains, demonstrates that larger models can handle disparate non-English datasets without sacrificing English-language understanding performance, and applies deduplication to reduce memorization issues.
- Architectural and objective improvements. The team employs a novel tuned mixture of different pretraining objectives to train the model to understand different aspects of language (unlike common approaches that use a single causal or masked language modelling objective).
PaLM 2 also includes control tokens to enable control over toxicity at inference time; and multilingual out-of-distribution “canary” token sequences that are injected into the pretraining data to provide insights on memorization across languages.
The paper’s empirical study compares PaLM 2 with PaLM on advanced language proficiency exams designed for humans and standard academic machine learning benchmarks ranging from English and multilingual language understanding to reasoning.
In the experiments, PaLM 2 substantially improved performance on a wide variety of tasks with faster and more efficient inference, demonstrated robust reasoning capabilities, and confirmed its ability to control toxicity with no additional overhead.
The PaLM 2 model family’s state-of-the-art performance validates that approaches other than model and parameter scaling — such as meticulous data selection and more efficient architectures and objectives — can play a crucial role in improving language understanding and generation.
The paper PaLM 2 Technical Report is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “Google’s PaLM 2 Technical Report Details the New Model Family’s Research Advances”