Being at the forefront of cost reduction and efficiency enhancement for large models, the Colossal-AI team maximizes the core capabilities of LLaMA-2. Through innovative training techniques, Colossal-AI has achieved remarkable results by utilizing only approximately 0.0085 trillion tokens of data, investing 15 hours, and incurring training costs in the range of a few hundred dollars.
In a new paper Neurons in Large Language Models: Dead, N-gram, Positional, a research team from Meta AI and Universitat Politècnica de Catalunya conducts comprehensive analysis of a family of Open Pre-trained Transformer Language Models (OPT) up to 66b parameters to provide insights of how feed-forward network (FFN) layers act.
In a new paper Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning, a Stanford University research team affirms that simple language skills can emerge in meta-RL agents without direct language supervision by testifying this theory in their customized multi-task environment.
In a new paper Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision, a Microsoft team research team presents Pareto optimal self-supervision, a flexible framework that leverages programmatic supervision to automatically calibrate and correct error for Large language models without extra manual efforts.
In the new paper READ: Recurrent Adaptation of Large Transformers, a Meta AI research team proposes REcurrent ADaption (READ), a lightweight and memory-efficient fine-tuning approach that achieves a 56 percent reduction in memory consumption and an 84 percent reduction in GPU use.
In the new paper StarCoder: May the Source Be With You!, the BigCode community releases StarCoder and StarCoderBase, 15.5B parameter open-access large language models (LLMs) trained on 80+ programming languages. StarCoderBase outperforms all multi-programming-language code LLMs, and StarCoder surpasses all models fine-tuned on Python.
In the new paper Dissecting Recall of Factual Associations in Auto-Regressive Language Models, a team from Google DeepMind, Tel Aviv University and Google Research investigates how factual associations are stored and extracted internally in transformer-based language models and provides insights on how such models’ factual predictions are formed.
In the new paper Inference with Reference: Lossless Acceleration of Large Language Models, a Microsoft research team proposes LLMA, an inference-with-reference decoding mechanism that achieves up to 2x lossless speed-ups with identical generation results by exploiting the overlaps between LLM outputs and references.
Colossal-AI open sources a complete RLHF pipeline that includes supervised data collection, supervised fine-tuning, reward model training, and reinforcement learning fine-tuning, based on the LLaMA pre-trained model, and shares ColossalChat, the most practical open-source project that closely resembles the original ChatGPT technical solution!
A Google Research team addresses transformers’ input sequence limitations in the new paper CoLT5: Faster Long-Range Transformers with Conditional Computation, proposing CoLT5 (Conditional LongT5), a family of models that applies a novel conditional computation approach for higher quality and faster long-input processing of up to 64,000 tokens.
In the new paper GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models, a research team from OpenAI, OpenResearch, and the University of Pennsylvania investigates the potential impact of LLMs like GPT on the US labour market, shedding light on the economic, social, and policy implications.
In the new paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation, a Microsoft research team introduces a novel approach that tunes a lightweight and versatile retriever to retrieve prompts for any given task input to improve the zero-shot performance of LLMs.
In the new paper MathPrompter: Mathematical Reasoning Using Large Language Models, a Microsoft Research team presents MathPrompter, a novel approach that leverages chain-of-thought (CoT) prompting techniques to improve LLM performance on mathematical reasoning problems and increase confidence in their predictions.
In the new paper Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback, a Microsoft Research and Columbia University team presents LLM-Augmenter, a system that augments black-box large language models with a set of plug-and-play modules to significantly improve the factuality of their responses.
In the new paper DocPrompting: Generating Code by Retrieving the Docs, a research team from Carnegie Mellon University and Inspired Cognition presents DocPrompting, a natural-language-to-code generation approach. Tasked with generating code to unseen functions or libraries from a natural language intent, DocPrompting retrieves corresponding code documentation to enable the model to learn to perform the task.
In the new paper Accelerating Large Language Model Decoding with Speculative Sampling, a DeepMind research team presents SpS (Speculative Sampling), an algorithm that achieves 2–2.5x decoding speedups on a 70 billion parameter Chinchilla language model. The novel approach maintains sample quality and does not require any modifications to model parameters or architecture.
In the new paper DetectGPT: Zero-Shot Machine-Generated Text Detection Using Probability Curvature, a Stanford University research team presents DetectGPT, a zero-shot machine-generated text detection algorithm that uses probability curvature to predict whether a candidate passage was generated by a large language model.
In the new paper Memory Augmented Large Language Models are Computationally Universal, Google Brain and University of Alberta researcher Dale Schuurmans establishes computational universality for a large language model augmented with an associative read-write memory.
In the new paper OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization, a Meta AI research team presents OPT-IML Bench, an Instruction Meta Learning benchmark comprising 2000 NLP tasks and an evaluation framework for model generalization.
In the new paper Structured Prompting: Scaling In-Context Learning to 1,000 Examples, a Microsoft Research team proposes structured prompting. The novel approach breaks through conventional in-context learning length limits, scaling to thousands of examples with reduced computation complexity and superior performance and stability.
In the new paper The Stack: 3 TB of Permissively Licensed Source Code, a team from ServiceNow Research and Hugging Face advances open and responsible research on code LLMs by releasing The Stack, a 3.1 TB dataset of permissively licensed source code in 30 programming languages.
In the new paper Fine-tuning Language Models To Find Agreement Among Humans With Diverse Preferences, a research team from DeepMind and University College London fine-tunes a 70 billion parameter language model to generate statements that maximize agreement among a human group with diverse written opinions.
In the new paper Fixing Model Bugs with Natural Language Patches, researchers from Stanford University and Microsoft Research propose a method that uses declarative statements as feedback for correcting errors in neural models, significantly increasing accuracy without high compute costs.
In the new paper Locating and Editing Factual Associations in GPT, a research team from MIT CSAIL, Northeastern University and Technion IIT examines how information flows during knowledge recall in large autoregressive transformers and introduces Rank-One Model Editing (ROME), a simple, zero-shot principled model editor capable of locating and editing factual associations in such models.
In the new paper Ask Me Anything: A Simple Strategy for Prompting Language Models, a research team from Stanford University, Numbers Station, and the University of Wisconsin-Madison presents Ask Me Anything Prompting (AMA), a simple large language model prompting strategy that enables a 30x smaller language model to outperform few-shot GPT3-175B.
In the new paper Vec2text With Round-Trip Translations, Google Brain researchers explore large language models’ capabilities for generating arbitrary natural language text from inputs of fixed-size vectors — a vec2text setting — and propose a simple data augmentation approach based on round-trip translations to improve vec2text model performance.
In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.
In the new paper Faithful Reasoning Using Large Language Models, a DeepMind research team proposes a forward-chaining selection-inference model that performs faithful reasoning and provides a valid reasoning trace to improve reasoning quality and help users validate the model’s final answers.
In the new paper PEER: A Collaborative Language Model, a research team from Meta AI, Carnegie Mellon University, PSL University, and University College London presents PEER, a collaborative language model that performs a humanlike writing process — composing drafts, adding suggestions, proposing edits and providing explanations for its actions.
Carnegie Mellon University researchers provide background information and details on contributions to the DialPort project over the last six years in their new paper The DialPort Tools. These tools — such as the DialPort Portal and DialCrowd — will be demoed at the SIGDIAL 2022 conference next month in Edinburgh.
In the new paper Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization, a research team from Microsoft Azure AI and Microsoft Research presents Z-Code++, a novel encoder-decoder pretrained language model optimized for abstractive summarization that significantly improves performance on low-resource summarization tasks.
In the new paper Few-shot Learning With Retrieval Augmented Language Models, a research team from Meta AI, PSL University, Inria, and University College London presents Atlas, a pretrained retrieval augmented language model that effectively learns new knowledge-intensive tasks under few-shot settings. Atlas outperforms the 540B parameter PaLM model on QA tasks while using 50x fewer parameters.
In the new paper Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent, a Stanford NLP research team presents Chirpy Cardinal, an open-domain conversational social chatbot with emotional and social intelligence that enables authentic and engaging interactions with real people.
In the new paper ReStructured Pre-training, a Carnegie Mellon University research team proposes “reStructured Pre-training” (RST), a novel NLP paradigm that pretrains models over valuable restructured data. The team’s resulting QIN system scores 40 points higher than the student average on the Gaokao-English Exam and 15 points higher than GPT-3 with 1/16 of the parameters.
In the new paper Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, a Google Brain research team presents Imagen, a text-to-image diffusion model that combines deep language understanding and photorealistic image generation capabilities to achieve a new state-of-the-art FID score of 7.27 on the COCO dataset.