Language model

by Synced 2024-11-29 23

DeepMind’s Socratic Learning with Language Games: The Path to Self-Improving Superintelligence

Researchers from Google DeepMind introduce the concept of “Socratic learning.” This refers to a form of recursive self-improvement in artificial intelligence that significantly enhances performance beyond the initial data or knowledge available to the system, as well as a practical framework to implement it.

by Synced 2024-04-20 5

AI Machine Learning & Data Science Research

DeepMind’s RecurrentGemma Pioneering Efficiency for Open Small Language Models

A Google DeepMind research team introduce RecurrentGemma, an open language model built on Google’s innovative Griffin architecture, which reduces memory usage and facilitates efficient inference on lengthy sequences, thereby unlocking new possibilities for highly efficient small language models in environments where resources are limited.

by Synced 2024-02-11 2

AI Machine Learning & Data Science Research

Introducing NVIDIA’s Audio Flamingo, the Next Frontier in Audio Language Models

An NVIDIA research team introduces Audio Flamingo, a groundbreaking audio language model that incorporates in-context learning (ICL), retrieval augmented generation (RAG), and multi-turn dialogue capabilities, achieving SOTA performance across various audio understanding tasks.

by Synced 2024-01-27 11

AI Machine Learning & Data Science Research

Stanford U & Open AI’s Meta-Prompting Elevates Language Model Performance, Surpassing Standard Prompting by 17%

In a new paper Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding, the team introduces meta-prompting. This innovative scaffolding approach proves to be highly effective, surpassing standard prompting by 17.1%, expert (dynamic) prompting by 17.3%, and multi-persona prompting by 15.2%.

by Synced 2023-10-24 2

AI Machine Learning & Data Science Research

Redefining Search Stack: Microsoft Unleashes the Potential of Large Language Models

In a new paper Large Search Model: Redefining Search Stack in the Era of LLMs, a Microsoft research team presents a novel conceptual framework, large search model, which reimagines the conventional search stack by consolidating various search tasks under a single Large Language Model (LLM).

by Synced 2023-10-12 3

AI Machine Learning & Data Science Research

Microsoft’s DeepSpeed-VisualChat: Breaking Boundaries in Multi-Modal Language Models

In a new paper DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention, a research team from DeepSpeed of Microsoft presents the DeepSpeed-VisualChat framework, which is designed to optimize LLMs by incorporating multi-modal capabilities, demonstrating superior scalability, even up to a 70 billion parameter model size.

by Synced 2023-09-25 2

AI Machine Learning & Data Science Nature Language Tech Research

One half-day of training using a few hundred dollars yields similar results to mainstream large models, open-source and commercial-free domain-specific LLM solution

Being at the forefront of cost reduction and efficiency enhancement for large models, the Colossal-AI team maximizes the core capabilities of LLaMA-2. Through innovative training techniques, Colossal-AI has achieved remarkable results by utilizing only approximately 0.0085 trillion tokens of data, investing 15 hours, and incurring training costs in the range of a few hundred dollars.

by Synced 2023-09-24 10

AI Machine Learning & Data Science Research

Language Models Redefined: Transforming Textual Mastery into Compression Brilliance

In a new paper Language Modeling Is Compression, a collaborative team from Google DeepMind, Meta AI, and Inria delves into the lossless compression capabilities of foundation models, unveiling their achievement of state-of-the-art compression rates across various data types.

by Synced 2023-08-15 14

AI Machine Learning & Data Science Research

Meta AI’s Shepherd Criticize Language Model Outputs to Crash Hallucinations

In a new paper Shepherd: A Critic for Language Model Generation, a Meta AI research team presents Shepherd, a language model that are explicitly tuned to critique model generated outputs as well as to generate feedbacks to suggest improvements on solving the factuality, logical errors, coherence, and alignment issues.

by Synced 2023-07-26 23

AI Machine Learning & Data Science Research

Brain2Music: Unveiling the intricacies of Human Interactions with Music

In a new paper Brain2Music: Reconstructing Music from Human Brain Activity, a research team from Google, Osaka University, NICT and Araya Inc. introduces Brain2Music, an approach for reconstructing music from brain activity by MusicLM, aiming to gain insights of the relationships between brain activity and human cognitive and sentimental experiences.

by Synced 2023-07-10 17

AI Machine Learning & Data Science Research

Microsoft’s LongNet Scales Transformer to One Billion Tokens

In a new paper LongNet: Scaling Transformers to 1,000,000,000 Tokens, a Microsoft research team presents LONGNET, a Transformer variant that successfully scaling sequence to more than 1 billion tokens while maintaining stronger performance and have a linear computation complexity.

by Synced 2023-07-04 2

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft’s new Pareto Optimal Self-Supervision Framework Automatically Corrects Language Models to Boost GPT SOTA Records

In a new paper Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision, a Microsoft team research team presents Pareto optimal self-supervision, a flexible framework that leverages programmatic supervision to automatically calibrate and correct error for Large language models without extra manual efforts.

by Synced 2023-06-28 3

AI Machine Learning & Data Science Research

Microsoft’s Crafted “Textbook Quality” Data Are All You Need to Train 10× Smaller Yet Strong Language Model for Code

In a new paper Textbooks Are All You Need, a Microsoft’s research team crafts ‘textbook quality’ data for training large language model for code, the resulting phi-1 model improves the state-of-the-art large language models (LLMs) with mere 1.3B-parameter.

by Synced 2023-05-02 6

AI Machine Learning & Data Science Nature Language Tech Research

Google & TAU Explore How Transformer-Based LLMs Extract Knowledge From Their Parameters

In the new paper Dissecting Recall of Factual Associations in Auto-Regressive Language Models, a team from Google DeepMind, Tel Aviv University and Google Research investigates how factual associations are stored and extracted internally in transformer-based language models and provides insights on how such models’ factual predictions are formed.

by Synced 2023-04-29 2

AI Machine Learning & Data Science Research

Microsoft & Peking U’s WizardLM Enables LLMs to Automatically Mass-Produce Complex Instructions

In the new paper WizardLM: Empowering Large Language Models to Follow Complex Instructions, a research team from Microsoft and Peking University presents Evol-Instruct, a novel approach that leverages LLMs to automatically generate large amounts of instruction data with varying levels of complexity. In human evaluations, the team’s resulting WizardLM model’s generated instructions were judged superior to human-created instruction datasets.

by Synced 2023-04-17 1

AI Machine Learning & Data Science Research

Google & UC Berkeley’s ‘Self-Debugging’ Framework Teaches LLMs to Debug Their Own Code

In the new paper Teaching Large Language Models to Self-Debug, a Google Research and UC Berkeley team presents Self-Debugging, a framework that teaches large language models to debug their own predicted code via few-shot demonstrations and improves baseline accuracy by up to 12 percent.

by Synced 2023-04-12 16

AI Machine Learning & Data Science Research

Stanford U & Google’s Generative Agents Produce Believable Proxies of Human Behaviours

In the new paper Generative Agents: Interactive Simulacra of Human Behavior, a team from Stanford University and Google Research presents agents that draw on generative models to simulate both individual and emergent group behaviours that are humanlike and based on their changing experiences and environment.

by Synced 2023-04-03 3

AI Machine Learning & Data Science Research

Meet TaskMatrix.AI: A Microsoft ‘Super-AI’ That Links Foundation Models With Millions of APIs to Perform Diverse Tasks

In the new paper TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs, a Microsoft research team proposes TaskMatrix.AI, a novel ecosystem that connects foundation models with millions of existing models and system APIs to build a “super-AI” capable of addressing a wide range of digital and physical tasks.

by Synced 2023-03-28 2

AI Machine Learning & Data Science Nature Language Tech Research

Google’s CoLT5 Processes Extremely Long Inputs via Conditional Computation

A Google Research team addresses transformers’ input sequence limitations in the new paper CoLT5: Faster Long-Range Transformers with Conditional Computation, proposing CoLT5 (Conditional LongT5), a family of models that applies a novel conditional computation approach for higher quality and faster long-input processing of up to 64,000 tokens.

by Synced 2023-03-16 2

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft’s MathPrompter Dramatically Improves LLM Performance on Mathematical Reasoning Tasks

In the new paper MathPrompter: Mathematical Reasoning Using Large Language Models, a Microsoft Research team presents MathPrompter, a novel approach that leverages chain-of-thought (CoT) prompting techniques to improve LLM performance on mathematical reasoning problems and increase confidence in their predictions.

by Synced 2023-03-14 8

AI Machine Learning & Data Science Research

Microsoft’s Visual ChatGPT Enables Image Understanding and Generation

In the new paper Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models, a Microsoft Research Asia team presents Visual ChatGPT, a system that incorporates various visual foundation models to enable ChatGPT to understand, generate and edit visual information.

by Synced 2023-03-07 23

AI Machine Learning & Data Science Nature Language Tech Popular Research

Toward AGI: Microsoft’s KOSMOS-1 MLLM Can Perceive General Modalities, Follow Instructions, and Perform In-Context Learning

In the new paper Language Is Not All You Need: Aligning Perception with Language Models, a Microsoft research team presents KOSMOS-1, a multimodal large language model (MLLM) that can perceive general modalities, learn in context, and follow instructions.

by Synced 2023-03-02 5

AI Machine Learning & Data Science Nature Language Tech Research

Tackling Hallucinations: Microsoft’s LLM-Augmenter Boosts ChatGPT’s Factual Answer Score

In the new paper Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback, a Microsoft Research and Columbia University team presents LLM-Augmenter, a system that augments black-box large language models with a set of plug-and-play modules to significantly improve the factuality of their responses.

by Synced 2023-02-28 36

AI Machine Learning & Data Science Nature Language Tech Research

CMU & Inspired Cognition’s DocPrompting Improves Code Generation by Retrieving Relevant Documentation

In the new paper DocPrompting: Generating Code by Retrieving the Docs, a research team from Carnegie Mellon University and Inspired Cognition presents DocPrompting, a natural-language-to-code generation approach. Tasked with generating code to unseen functions or libraries from a natural language intent, DocPrompting retrieves corresponding code documentation to enable the model to learn to perform the task.

by Synced 2023-02-16 2

AI Machine Learning & Data Science Research

Meta AI & UPF’s Toolformer: Enabling Language Models to Teach Themselves to Use External Tools

In the new paper Toolformer: Language Models Can Teach Themselves to Use Tools, a team from Meta AI Research and the Universitat Pompeu Fabra proposes Toolformer, a model that self-learns how to choose and use external tools such as search engines, calculators, and translation systems to boost performance on downstream tasks.

by Synced 2023-02-13 3

AI Machine Learning & Data Science Research

Hugging Face Releases LoRA Scripts for Efficient Stable Diffusion Fine-Tuning

A Hugging Face team collaborates with researcher Simo Ryu to provide a general approach that enables users to implement Low-Rank Adaptation (LoRA) in diffusers via both Dreambooth and full fine-tuning methods.

by Synced 2023-02-09 0

AI Machine Learning & Data Science Nature Language Tech Research

DeepMind’s Speculative Sampling Achieves 2–2.5x Decoding Speedups in Large Language Models

In the new paper Accelerating Large Language Model Decoding with Speculative Sampling, a DeepMind research team presents SpS (Speculative Sampling), an algorithm that achieves 2–2.5x decoding speedups on a 70 billion parameter Chinchilla language model. The novel approach maintains sample quality and does not require any modifications to model parameters or architecture.

by Synced 2023-02-01 5

AI Machine Learning & Data Science Nature Language Tech Research

Stanford U’s DetectGPT Takes a Curvature-Based Approach to LLM-Generated Text Detection

In the new paper DetectGPT: Zero-Shot Machine-Generated Text Detection Using Probability Curvature, a Stanford University research team presents DetectGPT, a zero-shot machine-generated text detection algorithm that uses probability curvature to predict whether a candidate passage was generated by a large language model.

by Synced 2023-01-18 1

AI Machine Learning & Data Science Nature Language Tech Research

Google Brain & Alberta U Paper Confirms the Computational Universality of Memory-Augmented Large Language Models

In the new paper Memory Augmented Large Language Models are Computationally Universal, Google Brain and University of Alberta researcher Dale Schuurmans establishes computational universality for a large language model augmented with an associative read-write memory.

by Synced 2023-01-11 1

AI Machine Learning & Data Science Research

Microsoft’s Neural Codec Language Models Synthesize High-Quality Personalized Speech From a 3-Second Sample

In the new paper Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers, a Microsoft research team presents VALL-E, the first language model-based text-to-speech (TTS) system with strong in-context learning. VALL-E achieves state-of-the-art personalized speech synthesis quality via prompting in a zero-shot setting.

by Synced 2023-01-03 4

AI Machine Learning & Data Science Research

Stanford & Buffalo U Advance Language Modelling with State Space Models

In the new paper Hungry Hungry Hippos: Towards Language Modeling with State Space Models, Stanford University and State University of New York at Buffalo researchers explore the expressivity gap between state space models and transformer language model attention mechanisms and propose FlashConv to improve state space model training efficiency on modern hardware.

by Synced 2022-12-29 1

AI Machine Learning & Data Science Nature Language Tech Research

Improving Instruction Tuning for LLMs: Meta AI Presents the OPT-IML Benchmark of 2000 NLP Tasks

In the new paper OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization, a Meta AI research team presents OPT-IML Bench, an Instruction Meta Learning benchmark comprising 2000 NLP tasks and an evaluation framework for model generalization.

by Synced 2022-12-14 9

AI Machine Learning & Data Science Nature Language Tech Research

Finding Truth in LLMs: UC Berkeley & Peking U Propose Unsupervised Contrast-Consistent Search

In the new paper Discovering Latent Knowledge in Language Models Without Supervision, a research team from UC Berkeley and Peking University presents Contrast-Consistent Search (CCS), an unsupervised approach for discovering latent knowledge in language models.

by Synced 2022-12-12 5

AI Machine Learning & Data Science Nature Language Tech Research

ServiceNow Research & Hugging Face Release The Stack: 3 TB of Permissively Licensed Source Code for LLMs

In the new paper The Stack: 3 TB of Permissively Licensed Source Code, a team from ServiceNow Research and Hugging Face advances open and responsible research on code LLMs by releasing The Stack, a 3.1 TB dataset of permissively licensed source code in 30 programming languages.

by Synced 2022-12-06 2

AI Machine Learning & Data Science Nature Language Tech Research

DeepMind & UCL Fine-tune a 70B Parameter LM to Generate Statements Agreeable to Humans with Diverse Opinions

In the new paper Fine-tuning Language Models To Find Agreement Among Humans With Diverse Preferences, a research team from DeepMind and University College London fine-tunes a 70 billion parameter language model to generate statements that maximize agreement among a human group with diverse written opinions.

by Synced 2022-11-30 2

AI Machine Learning & Data Science Research

DeepMind Studies Process- vs Outcome-based Model Supervision, Significantly Reducing Reasoning Errors on Math Word Problems

In the new paper Solving Math Word Problems With Process- and Outcome-based Feedback, a DeepMind research team conducts the first comprehensive comparison between process- and outcome-based model supervision. The two approaches achieve comparable final-answer error rate improvements on math word problems, while the process-based method significantly reduces reasoning errors from 14.0 to just 3.4 percent.

by Synced 2022-11-21 5

AI Machine Learning & Data Science Nature Language Tech Research

Talking to Models: Stanford U & Microsoft Method Enables Developers to Correct Model Bugs via Natural Language Patches

In the new paper Fixing Model Bugs with Natural Language Patches, researchers from Stanford University and Microsoft Research propose a method that uses declarative statements as feedback for correcting errors in neural models, significantly increasing accuracy without high compute costs.

by Synced 2022-11-17 4

AI Machine Learning & Data Science Research

Running Fast Transformers on CPUs: Intel Approach Achieves Significant Speed Ups and SOTA Performance

In the new paper Fast DistilBERT on CPUs, researchers from Intel Corporation and Intel Labs propose a pipeline and hardware-aware extreme compression technique for creating and running fast transformer models on CPUs. The approach achieves impressive speed ups and SOTA performance in production environments.

by Synced 2022-11-16 1

AI Machine Learning & Data Science Research

DeepMind’s Epistemic Neural Networks Enable Large Language Model Fine-Tuning With 50% Less Data

In the new paper Fine-Tuning Language Models via Epistemic Neural Networks, a DeepMind research team modifies large language models to create an Epistemic Neural Network. The novel approach achieves model performance comparable to that obtained via fine-tuning while requiring 50 percent less data.

by Synced 2022-11-08 8

AI Machine Learning & Data Science Nature Language Tech Popular Research

MIT, Northeastern & Technion Propose ROME for Efficient Locating and Editing of Factual Associations in GPT Models

In the new paper Locating and Editing Factual Associations in GPT, a research team from MIT CSAIL, Northeastern University and Technion IIT examines how information flows during knowledge recall in large autoregressive transformers and introduces Rank-One Model Editing (ROME), a simple, zero-shot principled model editor capable of locating and editing factual associations in such models.