Tag: Natural Language Processing

AI Machine Learning & Data Science Research

Meta AI’s Long-Context LLMs: Redefining the Landscape of Natural Language Processing

In a new paper Effective Long-Context Scaling of Foundation Models, a Meta AI research team presents a series of long-context LLMs, built through the pretraining from LLAMA 2. These models support effective context windows of up to 32,768 tokens and outperform all existing open-sourced models in terms of performance.

AI Machine Learning & Data Science Research

CMU & Tsinghua U’s Prompt2Model Generates Deployable Models Following Natural Language Instructions

In a new paper Prompt2Model: Generating Deployable Models from Natural Language Instructions, a research team from Carnegie Mellon University and Tsinghua University introduces Prompt2Model, a general-purpose approach that is able to use prompting technique to specify system behavior while resulting in a deployable special purpose model that enjoys all the advantages thereof.

AI Machine Learning & Data Science Research

Boston U’s Platpus Provides Quick, Cheap, and Powerful Refinement of LLMs, Achieving Top 1 in Open LLM Leaderboard

In a new paper Platypus: Quick, Cheap, and Powerful Refinement of LLMs, a Boston University research team presents Platpus, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the first place in HuggingFace’s Open LLM Leaderboard by performing quick, cheap and powerful refinement of conventional LLMs.

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft’s Structured Prompting Breaks In-Context Learning Length Limits, Scales to Thousands of Examples

In the new paper Structured Prompting: Scaling In-Context Learning to 1,000 Examples, a Microsoft Research team proposes structured prompting. The novel approach breaks through conventional in-context learning length limits, scaling to thousands of examples with reduced computation complexity and superior performance and stability.

AI Machine Learning & Data Science Research

Microsoft’s E5 Text Embedding Model Tops the MTEB Benchmark With 40x Fewer Parameters

In the new paper Text Embeddings by Weakly-Supervised Contrastive Pre-training, a Microsoft research team introduces Embeddings from Bidirectional Encoder Representations (E5), a general-purpose text embedding model for tasks requiring a single-vector representation of texts and the first model to surpass the BM25 baseline on the BEIR retrieval benchmark under a zero-shot setting.

AI Machine Learning & Data Science Nature Language Tech Research

CMU Details 6 Years of Contributions to the National Science Foundation- Funded DialPort Project for Dialog Research

Carnegie Mellon University researchers provide background information and details on contributions to the DialPort project over the last six years in their new paper The DialPort Tools. These tools — such as the DialPort Portal and DialCrowd — will be demoed at the SIGDIAL 2022 conference next month in Edinburgh.

AI Machine Learning & Data Science Research

Microsoft & Arizona U’s TextWorldExpress Simulates Text Games at 1M SPS, a Speedup of 3 Orders of Magnitude

In the new paper TextWorldExpress: Simulating Text Games at One Million Steps Per Second, a research team from the University of Arizona and Microsoft Research Montréal presents TextWorldExpress, a high-performance text-game simulator that boosts throughput by approximately three orders of magnitude, reaching one million steps per second.

AI Machine Learning & Data Science Nature Language Tech Research

Fancy a Friendly Chat? Stanford NLP’s Chirpy Cardinal Enables Open-Domain and Humanlike Conversations

In the new paper Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent, a Stanford NLP research team presents Chirpy Cardinal, an open-domain conversational social chatbot with emotional and social intelligence that enables authentic and engaging interactions with real people.

AI Machine Learning & Data Science Nature Language Tech Research

CMU’s Novel ‘ReStructured Pre-training’ NLP Approach Scores 40 Points Above Student Average on a Standard English Exam

In the new paper ReStructured Pre-training, a Carnegie Mellon University research team proposes “reStructured Pre-training” (RST), a novel NLP paradigm that pretrains models over valuable restructured data. The team’s resulting QIN system scores 40 points higher than the student average on the Gaokao-English Exam and 15 points higher than GPT-3 with 1/16 of the parameters.

AI Machine Learning & Data Science Nature Language Tech Research

Fact Tracing in LMs: MIT & Google Dataset and Benchmark Track Learned Knowledge Back to the Training Data

In the new paper Tracing Knowledge in Language Models Back to the Training Data, a team from MIT CSAIL and Google Research proposes a benchmark for tracing language models’ assertions to the associated training data, aiming to establish a principled ground truth and mitigate high compute demands for large neural language model training.

AI Machine Learning & Data Science Nature Language Tech Research

Training Compute-Optimal Large Language Models: DeepMind’s 70B Parameter Chinchilla Outperforms 530B Parameter Megatron-Turing

In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.

AI Machine Learning & Data Science Nature Language Tech Research

Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25%

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.

AI Machine Learning & Data Science Nature Language Tech Research

Google & IDSIA’s Block-Recurrent Transformer Dramatically Outperforms Transformers Over Very Long Sequences

A team from Google Research and the Swiss AI Lab IDSIA proposes the Block-Recurrent Transformer, a novel long-sequence processing approach that has the same computation time and parameter count costs as a conventional transformer layer but achieves significant perplexity improvements in language modelling tasks over very long sequences.

AI Machine Learning & Data Science Nature Language Tech Research

Sapienza U & OpenAI Propose Explanatory Learning to Enable Machines to Understand and Create Explanations

A research team from Sapienza University and OpenAI introduces an explanatory learning procedure that enables machines to understand existing explanations from symbolic sequences and create new explanations for unexplained phenomena, and further proposes Critical Rationalist Network (CRN) models for discovering explanations for novel phenomena.

AI Machine Learning & Data Science Nature Language Tech Research

MIT’s Automatic Data-Driven Media Bias Measurement Method Achieves Human-Level Results

MIT researchers present an automated, objective and transparent data-driven method for measuring media bias. The study analyses roughly a million articles from about a hundred newspapers for bias on various news topics, maps the newspapers into a two-dimensional media bias landscape, and shows that the data-driven results agree well with human-judgement classifications.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Google Researchers Enable Transformers to Solve Compositional NLP Tasks

A Google Research team explores the design space of Transformer models in an effort to enable deep learning architectures to solve compositional tasks. The proposed approach provides models with inductive biases via design decisions that significantly impact compositional generalization, and achieves state-of-the-art results on the COGS and PCFG composition benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

Google’s H-Transformer-1D: Fast One-Dimensional Hierarchical Attention With Linear Complexity for Long Sequence Processing

A Google Research team draws inspiration from two numerical analysis methods — Hierarchical Matrix (H-Matrix) and Multigrid — to address the quadratic complexity problem of attention mechanisms in transformer architectures, proposing a hierarchical attention scheme that has linear complexity in run time and memory.

AI Machine Learning & Data Science Nature Language Tech Research

Melbourne U, Facebook & Twitter Expose Novel Numerical Errors in NMT Systems

A research team from the University of Melbourne, Facebook AI, and Twitter Cortex proposes a black-box test method for assessing and debugging the numerical translation of neural machine translation systems in a systematic manner. The approach reveals novel types of errors that are general across multiple state-of-the-art translation systems.

AI Machine Learning & Data Science Research

Baidu’s Knowledge-Enhanced ERNIE 3.0 Pretraining Framework Delivers SOTA NLP Results, Surpasses Human Performance on the SuperGLUE Benchmark

A research team from Baidu proposes ERNIE 3.0, a unified framework for pretraining large-scale, knowledge-enhanced models that can easily be tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning, and achieves state-of-the-art results on NLP tasks.

AI Machine Learning & Data Science Nature Language Tech Research

ACL 2021 Best Paper: Finding the Optimal Vocabulary for Machine Translation via an Optimal Transport Approach

A research team from ByteDance AI Lab, University of Wisconsin–Madison and Nanjing University wins the ACL 2021 best paper award. Their proposed Vocabulary Learning via Optimal Transport (VOLT) approach leverages optimal transport to automatically find an optimal vocabulary without trial training.

AI Machine Learning & Data Science Nature Language Tech Research

Study Shows Transformers Possess the Compositionality Power for Mathematical Reasoning

A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.

AI Machine Learning & Data Science Research Share My Research

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT is a deep Transformer network architecture that incorporates clinical domain knowledge from a clinical Metathesaurus in order to build ‘semantically enriched’ contextual representations that will benefit from both the contextual learning and domain knowledge.