In the new technical report OPT: Open Pre-trained Transformer Language Models, Meta AI open-sources OPT, a suite of decoder-only pretrained transformers ranging from 125M to 175B parameters. The release will enable more researchers to work with large-scale language models to drive the field forward.
A Google Research team further explores the scaling approach for improving language modelling, leveraging the new Pathways distributed ML system to train a 540 billion parameter autoregressive transformer, Pathways Language Model (PaLM), that achieves state-of-the-art few-shot performance.
In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.
A research team from Carnegie Mellon University and Google systematically explores strategies for leveraging the relatively under-studied resource of bilingual lexicons to adapt pretrained multilingual models to low-resource languages. Their resulting Lexicon-based Adaptation approach produces consistent performance improvements without requiring additional monolingual text.
In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.
A research team from the University of Hong Kong, Shanghai AI Lab, Huawei Noah’s Ark Lab and the University of Washington takes dataset generation methods via large-scale pretrained language models (PLMs) to the extreme with ZEROGEN, a flexible and efficient zero-shot learning framework via dataset generation.
University of Illinois Urbana-Champaign and Google researchers introduce AutoDistill, an end-to-end fully automated model distillation framework that integrates model architecture exploration and multi-objective optimization for building hardware-efficient pretrained natural language processing models.
Peng Cheng Laboratory (PCL) and Baidu release PCL-BAIDU Wenxin, the world’s first knowledge-enhanced 100-billion-scale pretrained language model and the largest Chinese-language monolithic model with 260 billion parameters. PCL-BAIDU Wenxin achieves state-of-the-art results on more than 60 tasks and significantly advances more than 30 benchmarks for zero-shot and few-shot learning.
Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.
In the paper ReGen: Reinforcement Learning for Text and Knowledge Base Generation Using Pretrained Language Models, IBM researchers present ReGen, a bidirectional generation of text and graph that leverages reinforcement learning to push the performance of text-to-graph and graph-to-text generation tasks to a higher level.