In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.
In the new paper Interactive Code Generation via Test-Driven User-Intent Formalization, a team from Microsoft Research, the University of Pennsylvania, and the University of California, San Diego proposes a workflow for test-driven user-intent formalization that leverages user feedback to generate code that is 90.40 percent consistent with user intent.
In the new paper Few-shot Learning With Retrieval Augmented Language Models, a research team from Meta AI, PSL University, Inria, and University College London presents Atlas, a pretrained retrieval augmented language model that effectively learns new knowledge-intensive tasks under few-shot settings. Atlas outperforms the 540B parameter PaLM model on QA tasks while using 50x fewer parameters.
In the new paper BlenderBot 3: A Deployed Conversational Agent That Continually Learns to Responsibly Engage, researchers from Meta AI and Mila/McGill University release BlenderBot 3, a 175B parameter state-of-the-art open-domain dialogue model deployed on a public website. BlenderBot 3 is designed for continual learning via its user interactions.
In the new paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, a Salesforce Research team presents CodeRL, a novel framework for program synthesis tasks that employs pretrained language models (LMs) and deep reinforcement learning (RL) and achieves state-of-the-art performance on the challenging APPS benchmark while also demonstrating impressive zero-shot transfer capabilities.
In the new paper ReStructured Pre-training, a Carnegie Mellon University research team proposes “reStructured Pre-training” (RST), a novel NLP paradigm that pretrains models over valuable restructured data. The team’s resulting QIN system scores 40 points higher than the student average on the Gaokao-English Exam and 15 points higher than GPT-3 with 1/16 of the parameters.
In the new technical report OPT: Open Pre-trained Transformer Language Models, Meta AI open-sources OPT, a suite of decoder-only pretrained transformers ranging from 125M to 175B parameters. The release will enable more researchers to work with large-scale language models to drive the field forward.
A Google Research team further explores the scaling approach for improving language modelling, leveraging the new Pathways distributed ML system to train a 540 billion parameter autoregressive transformer, Pathways Language Model (PaLM), that achieves state-of-the-art few-shot performance.
In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.
A research team from Carnegie Mellon University and Google systematically explores strategies for leveraging the relatively under-studied resource of bilingual lexicons to adapt pretrained multilingual models to low-resource languages. Their resulting Lexicon-based Adaptation approach produces consistent performance improvements without requiring additional monolingual text.
In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.
A research team from the University of Hong Kong, Shanghai AI Lab, Huawei Noah’s Ark Lab and the University of Washington takes dataset generation methods via large-scale pretrained language models (PLMs) to the extreme with ZEROGEN, a flexible and efficient zero-shot learning framework via dataset generation.
University of Illinois Urbana-Champaign and Google researchers introduce AutoDistill, an end-to-end fully automated model distillation framework that integrates model architecture exploration and multi-objective optimization for building hardware-efficient pretrained natural language processing models.
Peng Cheng Laboratory (PCL) and Baidu release PCL-BAIDU Wenxin, the world’s first knowledge-enhanced 100-billion-scale pretrained language model and the largest Chinese-language monolithic model with 260 billion parameters. PCL-BAIDU Wenxin achieves state-of-the-art results on more than 60 tasks and significantly advances more than 30 benchmarks for zero-shot and few-shot learning.
Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.
In the paper ReGen: Reinforcement Learning for Text and Knowledge Base Generation Using Pretrained Language Models, IBM researchers present ReGen, a bidirectional generation of text and graph that leverages reinforcement learning to push the performance of text-to-graph and graph-to-text generation tasks to a higher level.