Tag: pretrained language model

AI Machine Learning & Data Science Research

Meta AI Open-Sources a 175B Parameter Language Model: GPT-3 Comparable Performance at One-Seventh the Compute Cost

In the new technical report OPT: Open Pre-trained Transformer Language Models, Meta AI open-sources OPT, a suite of decoder-only pretrained transformers ranging from 125M to 175B parameters. The release will enable more researchers to work with large-scale language models to drive the field forward.

AI Machine Learning & Data Science Nature Language Tech Research

Training Compute-Optimal Large Language Models: DeepMind’s 70B Parameter Chinchilla Outperforms 530B Parameter Megatron-Turing

In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.

AI Machine Learning & Data Science Research

CMU & Google Extend Pretrained Models to Thousands of Underrepresented Languages Without Using Monolingual Data

A research team from Carnegie Mellon University and Google systematically explores strategies for leveraging the relatively under-studied resource of bilingual lexicons to adapt pretrained multilingual models to low-resource languages. Their resulting Lexicon-based Adaptation approach produces consistent performance improvements without requiring additional monolingual text.

AI Machine Learning & Data Science Nature Language Tech Research

Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25%

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.

AI Machine Learning & Data Science Nature Language Tech Research

Peng Cheng Laboratory & Baidu Release PCL-BAIDU Wenxin: The World’s First Knowledge-Enhanced 100-Billion-Scale Pretrained Language Model

Peng Cheng Laboratory (PCL) and Baidu release PCL-BAIDU Wenxin, the world’s first knowledge-enhanced 100-billion-scale pretrained language model and the largest Chinese-language monolithic model with 260 billion parameters. PCL-BAIDU Wenxin achieves state-of-the-art results on more than 60 tasks and significantly advances more than 30 benchmarks for zero-shot and few-shot learning.