pretrained language model

by Synced 2023-08-07 2

DeepMind & Tokyo U’s WebAgent Realizes Real-World Web Navigation Following Natural Language Instructions

In a new paper A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis, a research team from Google DeepMind and The University of Tokyo presents WebAgent, a LLMs-driven real-world web navigation agent that can address real websites tasks following natural language instructions.

by Synced 2022-09-15 3

AI Machine Learning & Data Science Nature Language Tech Research

Peking U & Microsoft’s Knowledge Attribution Method Enables Editing Factual Knowledge in Pretrained Transformers Without Fine-Tuning

In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.

by Synced 2022-08-18 12

AI Machine Learning & Data Science Research

Microsoft, Penn U & UC San Diego’s TiCoder Framework Generates Code With 90.4% Consistency to User Intent

In the new paper Interactive Code Generation via Test-Driven User-Intent Formalization, a team from Microsoft Research, the University of Pennsylvania, and the University of California, San Diego proposes a workflow for test-driven user-intent formalization that leverages user feedback to generate code that is 90.40 percent consistent with user intent.

by Synced 2022-08-15 4

AI Machine Learning & Data Science Nature Language Tech Research

Meet Atlas: A Pretrained Retrieval Augmented Language Model That Outperforms a 540B Parameter Model But Requires 50x Fewer Parameters

In the new paper Few-shot Learning With Retrieval Augmented Language Models, a research team from Meta AI, PSL University, Inria, and University College London presents Atlas, a pretrained retrieval augmented language model that effectively learns new knowledge-intensive tasks under few-shot settings. Atlas outperforms the 540B parameter PaLM model on QA tasks while using 50x fewer parameters.

by Synced 2022-08-11 1

AI Machine Learning & Data Science Research

Meta AI & Mila Publicly Release BlenderBot 3: A 175B SOTA Chatbot That Continually Improves via Human Interactions

In the new paper BlenderBot 3: A Deployed Conversational Agent That Continually Learns to Responsibly Engage, researchers from Meta AI and Mila/McGill University release BlenderBot 3, a 175B parameter state-of-the-art open-domain dialogue model deployed on a public website. BlenderBot 3 is designed for continual learning via its user interactions.

by Synced 2022-07-07 1

AI Machine Learning & Data Science Research

Salesforce’s CodeRL Achieves SOTA Code Generation Results With Strong Zero-Shot Transfer Capabilities

In the new paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, a Salesforce Research team presents CodeRL, a novel framework for program synthesis tasks that employs pretrained language models (LMs) and deep reinforcement learning (RL) and achieves state-of-the-art performance on the challenging APPS benchmark while also demonstrating impressive zero-shot transfer capabilities.

by Synced 2022-06-28 6

AI Machine Learning & Data Science Nature Language Tech Research

CMU’s Novel ‘ReStructured Pre-training’ NLP Approach Scores 40 Points Above Student Average on a Standard English Exam

In the new paper ReStructured Pre-training, a Carnegie Mellon University research team proposes “reStructured Pre-training” (RST), a novel NLP paradigm that pretrains models over valuable restructured data. The team’s resulting QIN system scores 40 points higher than the student average on the Gaokao-English Exam and 15 points higher than GPT-3 with 1/16 of the parameters.

by Synced 2022-05-06 6

AI Machine Learning & Data Science Research

Meta AI Open-Sources a 175B Parameter Language Model: GPT-3 Comparable Performance at One-Seventh the Compute Cost

In the new technical report OPT: Open Pre-trained Transformer Language Models, Meta AI open-sources OPT, a suite of decoder-only pretrained transformers ranging from 125M to 175B parameters. The release will enable more researchers to work with large-scale language models to drive the field forward.

by Synced 2022-04-06 3

AI Machine Learning & Data Science Research

Google Trains a 540B Parameter Language Model With Pathways, Achieving ‘Breakthrough Performance’

A Google Research team further explores the scaling approach for improving language modelling, leveraging the new Pathways distributed ML system to train a 540 billion parameter autoregressive transformer, Pathways Language Model (PaLM), that achieves state-of-the-art few-shot performance.

by Synced 2022-04-04 3

AI Machine Learning & Data Science Nature Language Tech Research

Training Compute-Optimal Large Language Models: DeepMind’s 70B Parameter Chinchilla Outperforms 530B Parameter Megatron-Turing

In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.

by Synced 2022-03-30 0

AI Machine Learning & Data Science Research

CMU & Google Extend Pretrained Models to Thousands of Underrepresented Languages Without Using Monolingual Data

A research team from Carnegie Mellon University and Google systematically explores strategies for leveraging the relatively under-studied resource of bilingual lexicons to adapt pretrained multilingual models to low-resource languages. Their resulting Lexicon-based Adaptation approach produces consistent performance improvements without requiring additional monolingual text.

by Synced 2022-03-29 1

AI Machine Learning & Data Science Nature Language Tech Research

Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25%

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.

by Synced 2022-02-18 0

AI Machine Learning & Data Science Research

Meet ZEROGEN: An Extreme Method for Dataset Generation via PLMs for Zero-Shot Learning

A research team from the University of Hong Kong, Shanghai AI Lab, Huawei Noah’s Ark Lab and the University of Washington takes dataset generation methods via large-scale pretrained language models (PLMs) to the extreme with ZEROGEN, a flexible and efficient zero-shot learning framework via dataset generation.

by Synced 2022-01-26 3

AI Machine Learning & Data Science Research

AutoDistill: An End-to-End Fully Automated Distillation Framework for Hardware-Efficient Large-Scale NLP Models

University of Illinois Urbana-Champaign and Google researchers introduce AutoDistill, an end-to-end fully automated model distillation framework that integrates model architecture exploration and multi-objective optimization for building hardware-efficient pretrained natural language processing models.

by Synced 2021-12-09 13

AI Machine Learning & Data Science Nature Language Tech Research

Peng Cheng Laboratory & Baidu Release PCL-BAIDU Wenxin: The World’s First Knowledge-Enhanced 100-Billion-Scale Pretrained Language Model

Peng Cheng Laboratory (PCL) and Baidu release PCL-BAIDU Wenxin, the world’s first knowledge-enhanced 100-billion-scale pretrained language model and the largest Chinese-language monolithic model with 260 billion parameters. PCL-BAIDU Wenxin achieves state-of-the-art results on more than 60 tasks and significantly advances more than 30 benchmarks for zero-shot and few-shot learning.

by Synced 2021-11-03 1

AI Machine Learning & Data Science Nature Language Tech Research

Twitter Cortex Proposes LMSOC for Socially Sensitive Pretraining

In the new paper LMSOC: An Approach for Socially Sensitive Pretraining, a Twitter Cortex research team proposes a simple but effective approach for learning both linguistically contextualized and socially sensitive representations in large-scale language models.

by Synced 2021-10-26 3

AI Machine Learning & Data Science Nature Language Tech Research

Facebook AI’s NormFormer Employs Extra Normalization to Significantly Improve Transformer Pretraining

Facebook AI Research proposes NormFormer, an approach that improves pretraining perplexity and downstream task performance for both causal and masked language models, achieving GPT3-Large (1.3B) zero-shot performance 60 percent faster and improving fine-tuned GLUE performance by 1.9 percent.

by Synced 2021-09-08 28

AI Machine Learning & Data Science Research

IBM Leverages Reinforcement Learning to Achieve SOTA Performance on Text and Knowledge Base Generation

In the paper ReGen: Reinforcement Learning for Text and Knowledge Base Generation Using Pretrained Language Models, IBM researchers present ReGen, a bidirectional generation of text and graph that leverages reinforcement learning to push the performance of text-to-graph and graph-to-text generation tasks to a higher level.