In a new paper Large Search Model: Redefining Search Stack in the Era of LLMs, a Microsoft research team presents a novel conceptual framework, large search model, which reimagines the conventional search stack by consolidating various search tasks under a single Large Language Model (LLM).
In a new paper DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention, a research team from DeepSpeed of Microsoft presents the DeepSpeed-VisualChat framework, which is designed to optimize LLMs by incorporating multi-modal capabilities, demonstrating superior scalability, even up to a 70 billion parameter model size.
Being at the forefront of cost reduction and efficiency enhancement for large models, the Colossal-AI team maximizes the core capabilities of LLaMA-2. Through innovative training techniques, Colossal-AI has achieved remarkable results by utilizing only approximately 0.0085 trillion tokens of data, investing 15 hours, and incurring training costs in the range of a few hundred dollars.
In a new paper Shepherd: A Critic for Language Model Generation, a Meta AI research team presents Shepherd, a language model that are explicitly tuned to critique model generated outputs as well as to generate feedbacks to suggest improvements on solving the factuality, logical errors, coherence, and alignment issues.
In a new paper Brain2Music: Reconstructing Music from Human Brain Activity, a research team from Google, Osaka University, NICT and Araya Inc. introduces Brain2Music, an approach for reconstructing music from brain activity by MusicLM, aiming to gain insights of the relationships between brain activity and human cognitive and sentimental experiences.
In a new paper LongNet: Scaling Transformers to 1,000,000,000 Tokens, a Microsoft research team presents LONGNET, a Transformer variant that successfully scaling sequence to more than 1 billion tokens while maintaining stronger performance and have a linear computation complexity.
In a new paper Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision, a Microsoft team research team presents Pareto optimal self-supervision, a flexible framework that leverages programmatic supervision to automatically calibrate and correct error for Large language models without extra manual efforts.
In the new paper Dissecting Recall of Factual Associations in Auto-Regressive Language Models, a team from Google DeepMind, Tel Aviv University and Google Research investigates how factual associations are stored and extracted internally in transformer-based language models and provides insights on how such models’ factual predictions are formed.
In the new paper WizardLM: Empowering Large Language Models to Follow Complex Instructions, a research team from Microsoft and Peking University presents Evol-Instruct, a novel approach that leverages LLMs to automatically generate large amounts of instruction data with varying levels of complexity. In human evaluations, the team’s resulting WizardLM model’s generated instructions were judged superior to human-created instruction datasets.
In the new paper Teaching Large Language Models to Self-Debug, a Google Research and UC Berkeley team presents Self-Debugging, a framework that teaches large language models to debug their own predicted code via few-shot demonstrations and improves baseline accuracy by up to 12 percent.
In the new paper Generative Agents: Interactive Simulacra of Human Behavior, a team from Stanford University and Google Research presents agents that draw on generative models to simulate both individual and emergent group behaviours that are humanlike and based on their changing experiences and environment.
In the new paper TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs, a Microsoft research team proposes TaskMatrix.AI, a novel ecosystem that connects foundation models with millions of existing models and system APIs to build a “super-AI” capable of addressing a wide range of digital and physical tasks.
A Google Research team addresses transformers’ input sequence limitations in the new paper CoLT5: Faster Long-Range Transformers with Conditional Computation, proposing CoLT5 (Conditional LongT5), a family of models that applies a novel conditional computation approach for higher quality and faster long-input processing of up to 64,000 tokens.
In the new paper MathPrompter: Mathematical Reasoning Using Large Language Models, a Microsoft Research team presents MathPrompter, a novel approach that leverages chain-of-thought (CoT) prompting techniques to improve LLM performance on mathematical reasoning problems and increase confidence in their predictions.
In the new paper Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, a Microsoft Research Asia team presents Visual ChatGPT, a system that incorporates various visual foundation models to enable ChatGPT to understand, generate and edit visual information.
In the new paper Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback, a Microsoft Research and Columbia University team presents LLM-Augmenter, a system that augments black-box large language models with a set of plug-and-play modules to significantly improve the factuality of their responses.
In the new paper DocPrompting: Generating Code by Retrieving the Docs, a research team from Carnegie Mellon University and Inspired Cognition presents DocPrompting, a natural-language-to-code generation approach. Tasked with generating code to unseen functions or libraries from a natural language intent, DocPrompting retrieves corresponding code documentation to enable the model to learn to perform the task.
In the new paper Toolformer: Language Models Can Teach Themselves to Use Tools, a team from Meta AI Research and the Universitat Pompeu Fabra proposes Toolformer, a model that self-learns how to choose and use external tools such as search engines, calculators, and translation systems to boost performance on downstream tasks.
In the new paper Accelerating Large Language Model Decoding with Speculative Sampling, a DeepMind research team presents SpS (Speculative Sampling), an algorithm that achieves 2–2.5x decoding speedups on a 70 billion parameter Chinchilla language model. The novel approach maintains sample quality and does not require any modifications to model parameters or architecture.
In the new paper DetectGPT: Zero-Shot Machine-Generated Text Detection Using Probability Curvature, a Stanford University research team presents DetectGPT, a zero-shot machine-generated text detection algorithm that uses probability curvature to predict whether a candidate passage was generated by a large language model.
In the new paper Memory Augmented Large Language Models are Computationally Universal, Google Brain and University of Alberta researcher Dale Schuurmans establishes computational universality for a large language model augmented with an associative read-write memory.
In the new paper Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers, a Microsoft research team presents VALL-E, the first language model-based text-to-speech (TTS) system with strong in-context learning. VALL-E achieves state-of-the-art personalized speech synthesis quality via prompting in a zero-shot setting.
In the new paper Hungry Hungry Hippos: Towards Language Modeling with State Space Models, Stanford University and State University of New York at Buffalo researchers explore the expressivity gap between state space models and transformer language model attention mechanisms and propose FlashConv to improve state space model training efficiency on modern hardware.
In the new paper OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization, a Meta AI research team presents OPT-IML Bench, an Instruction Meta Learning benchmark comprising 2000 NLP tasks and an evaluation framework for model generalization.
In the new paper The Stack: 3 TB of Permissively Licensed Source Code, a team from ServiceNow Research and Hugging Face advances open and responsible research on code LLMs by releasing The Stack, a 3.1 TB dataset of permissively licensed source code in 30 programming languages.
In the new paper Fine-tuning Language Models To Find Agreement Among Humans With Diverse Preferences, a research team from DeepMind and University College London fine-tunes a 70 billion parameter language model to generate statements that maximize agreement among a human group with diverse written opinions.
In the new paper Solving Math Word Problems With Process- and Outcome-based Feedback, a DeepMind research team conducts the first comprehensive comparison between process- and outcome-based model supervision. The two approaches achieve comparable final-answer error rate improvements on math word problems, while the process-based method significantly reduces reasoning errors from 14.0 to just 3.4 percent.
In the new paper Fixing Model Bugs with Natural Language Patches, researchers from Stanford University and Microsoft Research propose a method that uses declarative statements as feedback for correcting errors in neural models, significantly increasing accuracy without high compute costs.
In the new paper Fast DistilBERT on CPUs, researchers from Intel Corporation and Intel Labs propose a pipeline and hardware-aware extreme compression technique for creating and running fast transformer models on CPUs. The approach achieves impressive speed ups and SOTA performance in production environments.
In the new paper Fine-Tuning Language Models via Epistemic Neural Networks, a DeepMind research team modifies large language models to create an Epistemic Neural Network. The novel approach achieves model performance comparable to that obtained via fine-tuning while requiring 50 percent less data.
In the new paper Locating and Editing Factual Associations in GPT, a research team from MIT CSAIL, Northeastern University and Technion IIT examines how information flows during knowledge recall in large autoregressive transformers and introduces Rank-One Model Editing (ROME), a simple, zero-shot principled model editor capable of locating and editing factual associations in such models.
In the new paper Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, a Google Research and Stanford University team applies chain-of-thought (CoT) prompting — a series of intermediate reasoning steps — to 23 BIG-Bench tasks on which language models have failed to outperform the average human rater. The proposed approach enables models to surpass human performance on 17 of the 23 tasks.
In the new paper Ask Me Anything: A Simple Strategy for Prompting Language Models, a research team from Stanford University, Numbers Station, and the University of Wisconsin-Madison presents Ask Me Anything Prompting (AMA), a simple large language model prompting strategy that enables a 30x smaller language model to outperform few-shot GPT3-175B.
In the new paper Promptagator: Few-shot Dense Retrieval From 8 Examples, a Google Research team proposes Prompt-based Query Generation for Retriever (Promptagator), a novel and simple approach for few-shot retrieval that leverages large language model (LLM) prompting to generate synthetic task-specific training data.
In the new paper Vec2text With Round-Trip Translations, Google Brain researchers explore large language models’ capabilities for generating arbitrary natural language text from inputs of fixed-size vectors — a vec2text setting — and propose a simple data augmentation approach based on round-trip translations to improve vec2text model performance.