In a new paper Effective Long-Context Scaling of Foundation Models, a Meta AI research team presents a series of long-context LLMs, built through the pretraining from LLAMA 2. These models support effective context windows of up to 32,768 tokens and outperform all existing open-sourced models in terms of performance.
In a new paper Prompt2Model: Generating Deployable Models from Natural Language Instructions, a research team from Carnegie Mellon University and Tsinghua University introduces Prompt2Model, a general-purpose approach that is able to use prompting technique to specify system behavior while resulting in a deployable special purpose model that enjoys all the advantages thereof.
In a new paper Platypus: Quick, Cheap, and Powerful Refinement of LLMs, a Boston University research team presents Platpus, a family of fine-tuned and merged Large Language Models (LLMs) that achieves the first place in HuggingFace’s Open LLM Leaderboard by performing quick, cheap and powerful refinement of conventional LLMs.
In a new paper A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis, a research team from Google DeepMind and The University of Tokyo presents WebAgent, a LLMs-driven real-world web navigation agent that can address real websites tasks following natural language instructions.
In a new paper Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning, a Stanford University research team affirms that simple language skills can emerge in meta-RL agents without direct language supervision by testifying this theory in their customized multi-task environment.
In the new paper Structured Prompting: Scaling In-Context Learning to 1,000 Examples, a Microsoft Research team proposes structured prompting. The novel approach breaks through conventional in-context learning length limits, scaling to thousands of examples with reduced computation complexity and superior performance and stability.
In the new paper Text Embeddings by Weakly-Supervised Contrastive Pre-training, a Microsoft research team introduces Embeddings from Bidirectional Encoder Representations (E5), a general-purpose text embedding model for tasks requiring a single-vector representation of texts and the first model to surpass the BM25 baseline on the BEIR retrieval benchmark under a zero-shot setting.
In the new paper Fixing Model Bugs with Natural Language Patches, researchers from Stanford University and Microsoft Research propose a method that uses declarative statements as feedback for correcting errors in neural models, significantly increasing accuracy without high compute costs.
Carnegie Mellon University researchers provide background information and details on contributions to the DialPort project over the last six years in their new paper The DialPort Tools. These tools — such as the DialPort Portal and DialCrowd — will be demoed at the SIGDIAL 2022 conference next month in Edinburgh.
In the new paper TextWorldExpress: Simulating Text Games at One Million Steps Per Second, a research team from the University of Arizona and Microsoft Research Montréal presents TextWorldExpress, a high-performance text-game simulator that boosts throughput by approximately three orders of magnitude, reaching one million steps per second.
In the new paper Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent, a Stanford NLP research team presents Chirpy Cardinal, an open-domain conversational social chatbot with emotional and social intelligence that enables authentic and engaging interactions with real people.
In the new paper ReStructured Pre-training, a Carnegie Mellon University research team proposes “reStructured Pre-training” (RST), a novel NLP paradigm that pretrains models over valuable restructured data. The team’s resulting QIN system scores 40 points higher than the student average on the Gaokao-English Exam and 15 points higher than GPT-3 with 1/16 of the parameters.
In the new paper Tracing Knowledge in Language Models Back to the Training Data, a team from MIT CSAIL and Google Research proposes a benchmark for tracing language models’ assertions to the associated training data, aiming to establish a principled ground truth and mitigate high compute demands for large neural language model training.
In the new paper Training Compute-Optimal Large Language Models, a DeepMind research team posits that current large language models are significantly undertrained and, based on empirical outcomes of over 400 training runs, proposes three predictive approaches for optimally setting model size and training duration.
In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.
A team from Google Research and the Swiss AI Lab IDSIA proposes the Block-Recurrent Transformer, a novel long-sequence processing approach that has the same computation time and parameter count costs as a conventional transformer layer but achieves significant perplexity improvements in language modelling tasks over very long sequences.
A research team from Sapienza University and OpenAI introduces an explanatory learning procedure that enables machines to understand existing explanations from symbolic sequences and create new explanations for unexplained phenomena, and further proposes Critical Rationalist Network (CRN) models for discovering explanations for novel phenomena.
A Google AI research team explores zero-label learning (training with synthetic data only) in natural language processing, and introduces Unsupervised Data Generation (UDG), a training data creation procedure designed to synthesize high-quality training data without human annotations.
MIT researchers present an automated, objective and transparent data-driven method for measuring media bias. The study analyses roughly a million articles from about a hundred newspapers for bias on various news topics, maps the newspapers into a two-dimensional media bias landscape, and shows that the data-driven results agree well with human-judgement classifications.
A Google Research team explores the design space of Transformer models in an effort to enable deep learning architectures to solve compositional tasks. The proposed approach provides models with inductive biases via design decisions that significantly impact compositional generalization, and achieves state-of-the-art results on the COGS and PCFG composition benchmarks.
A Google Research team draws inspiration from two numerical analysis methods — Hierarchical Matrix (H-Matrix) and Multigrid — to address the quadratic complexity problem of attention mechanisms in transformer architectures, proposing a hierarchical attention scheme that has linear complexity in run time and memory.
A research team from the University of Melbourne, Facebook AI, and Twitter Cortex proposes a black-box test method for assessing and debugging the numerical translation of neural machine translation systems in a systematic manner. The approach reveals novel types of errors that are general across multiple state-of-the-art translation systems.
A Google Research team proposes Wordcraft, a text editor with a built-in AI-powered creative writing assistant. Wordcraft uses few-shot learning and the natural affordances of conversation to support a variety of user interactions; and can help with story planning, writing and editing.
A research team from Baidu proposes ERNIE 3.0, a unified framework for pretraining large-scale, knowledge-enhanced models that can easily be tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning, and achieves state-of-the-art results on NLP tasks.
A research team from ByteDance AI Lab, University of Wisconsin–Madison and Nanjing University wins the ACL 2021 best paper award. Their proposed Vocabulary Learning via Optimal Transport (VOLT) approach leverages optimal transport to automatically find an optimal vocabulary without trial training.
A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.
A research team from McGill University, Mila – Quebec AI Institute and Facebook AI proposes novel metrics and perturbation functions to detect, quantify and compare trade-offs between robustness and faithfulness in NMT systems, both on the corpus level and with particular examples.
UmlsBERT is a deep Transformer network architecture that incorporates clinical domain knowledge from a clinical Metathesaurus in order to build ‘semantically enriched’ contextual representations that will benefit from both the contextual learning and domain knowledge.