large language model

by Synced 2024-12-17 15

From Token to Conceptual: Meta introduces Large Concept Models in Multilingual AI

A research team at Meta introduces the Large Concept Model (LCM), a novel architecture that processes input at a higher semantic level. This shift allows the LCM to achieve remarkable zero-shot generalization across languages, outperforming existing LLMs of comparable size.

by Synced 2024-12-12 19

AI Machine Learning & Data Science Research

From Response to Query: The Power of Reverse Thinking in Language Models

In a new paper Time-Reversal Provides Unsupervised Feedback to LLMs, a research team from Google DeepMind and Indian Institute of Science proposes Time Reversed Language Models (TRLMs), a framework that allows LLMs to reason in reverse—scoring and generating content in a manner opposite to the traditional forward approach.

by Synced 2024-12-07 33

AI Machine Learning & Data Science Research

The Future of Vision AI: How Apple’s AIMV2 Leverages Images and Text to Lead the Pack

An Apple research team introduces AIMV2, a family of vision encoders that is designed to predict both image patches and text tokens within a unified sequence. This combined objective enables the model to excel in a range of tasks, such as image recognition, visual grounding, and multimodal understanding.

by Synced 2024-11-19 5

AI Machine Learning & Data Science Research

Meta’s Dualformer: Bridging Fast and Slow Thinking in Transformers for Superior AI Reasoning

In a new paper Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces, a Meta research team presents Dualformer, a single Transformer model that merges both fast and slow reasoning modes within a unified framework.

by Synced 2024-11-17 5

AI Machine Learning & Data Science Research

NVIDIA’s OMCAT: A Breakthrough in Cross-Modal Temporal Understanding for Multimodal AI

An NVIDIA research team introduces OMCAT: Omni Context Aware Transformer in their new paper, presenting both OCTAV, a unique dataset aimed at capturing event transitions across audio and video, and OMCAT, a model that employs RoTE (Rotary Time Embeddings).

by Synced 2024-11-15 9

AI Machine Learning & Data Science Research

Stanford U’s Tutor CoPilot Transforms Real-Time Tutoring with AI-Driven Expert Guidance

A Stanford University research team presents Tutor CoPilot, a new model that offers expert-level guidance to tutors in real time. This study is the first of its kind—a randomized controlled trial testing a Human-AI system in live tutoring scenarios.

by Synced 2024-11-12 5

AI Machine Learning & Data Science Research

Bridging the Gap: Induction-Head Ngram Models for Efficient, Interpretable Language Modeling

A research team introduces a novel approach called Induction-head ngram models (Induction-Gram). This technique merges the interpretability and efficiency of n-gram models with insights from neural LLMs to enhance language modeling performance.

by Synced 2024-11-07 5

AI Machine Learning & Data Science Research

Self-Evolving Prompts: Redefining AI Alignment with DeepMind & Chicago U’s eva Framework

A research team from DeepMind and Chicago University presents a novel approach to Reinforcement Learning from Human Feedback. The proposed eva introduces a flexible, scalable framework that leverages any RLHF algorithm to drive more effective alignment with human values

by Synced 2024-11-05 4

AI Machine Learning & Data Science Research

Unlocking Turing Completeness: How Large Language Models Achieve Universal Computation Without Assistance

A research team from Google DeepMind and the University of Alberta presents evidence that transformer-based LLMs using autoregressive decoding can indeed support universal computation without any external adjustments or modifications to model weights.

by Synced 2024-10-30 5

AI Machine Learning & Data Science Research

From OCR to Multi-Image Insight: Apple’s MM1.5 with Enhanced Text-Rich Image Understanding and Visual Reasoning

Building on MM1’s success, Apple’s new paper, MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning, introduces an improved model family aimed at enhancing capabilities in text-rich image understanding, visual grounding, and multi-image reasoning.

by Synced 2024-10-23 3

AI Machine Learning & Data Science Research

LLMs as Code Architects: Meta’s New Approach to Precise Code Transformations

In a new paper Don’t Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs, a Meta research team proposes a novel chain-of-thought strategy to efficiently generate code transformations using LLMs. Their approach enables LLMs to derive transformations based on a small set of input/output examples.

by Synced 2024-10-21 4

AI Machine Learning & Data Science Research

Thinking Fast and Slow: Google DeepMind’s Dual-Agent Architecture for Smarter AI

A Google DeepMind research team proposes a biologically-inspired dual-system framework for intelligent agents. This “Talker-Reasoner” architecture aligns with Kahneman’s concept, where System 1 is fast and intuitive, while System 2 is slower and deliberative.

by Synced 2024-10-16 5

AI Machine Learning & Data Science Research

From Dense to Dynamic: NVIDIA’s Innovations in Upcycling LLMs to Sparse MoE

In a new paper Upcycling Large Language Models into Mixture of Experts, an NVIDIA research team introduces a new “virtual group” initialization technique to facilitate the transition of dense models into fine-grained MoE structures.

by Synced 2024-10-09 3

AI Machine Learning & Data Science Research

Scaling Multi-Objective Optimization: Meta & FAIR’s CGPO Advances General-purpose LLMs

In a new paper The Perfect Blend: Redefining RLHF with Mixture of Judges, a research team from Meta GenAI and FAIR developed Constrained Generative Policy Optimization (CGPO), which offers a more structured approach to RLHF, advancing the performance of general-purpose LLMs.

by Synced 2024-10-03 8

AI Machine Learning & Data Science Research

Law of the Weakest Link: Advancing Large Language Models Through Cross-Capability

A joint research team from Meta and the University of Illinois Urbana-Champaign introduces CrossEval, a benchmark designed to assess both individual and cross capabilities. Their findings demonstrate that LLMs often adhere to the “Law of the Weakest Link”—where performance on complex tasks is limited by the weakest capability.

by Synced 2024-09-28 5

AI Machine Learning & Data Science Research

Practical Lossless Text Compression: FineZip Delivers 54x Speed Boost via Large Language Models

In a new paper FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression, a research team from UC Berkeley and NYU introduces FineZip, a novel LLM-based compression system designed to significantly reduce compression time.

by Synced 2024-09-17 4

AI Machine Learning & Data Science Research

Stanford’s Landmark Study: AI-Generated Ideas Rated More Novel Than Expert Concepts

A Sandford U’s research team introduces an experimental framework aimed at evaluating LLMs’ ability to generate research ideas. This study, the first of its kind, compares the ideation capabilities of over 100 expert NLP researchers against an LLM-based ideation system.

by Synced 2024-09-13 5

AI Machine Learning & Data Science Research

Revolutionizing Autonomous Agents: Salesforce’s xLAM Outperforms GPT-4

A Salesforce AI Research team presents the xLAM series, a collection of large action models designed to enhance the performance of open-source LLMs for autonomous AI agents. This work aims to accelerate innovation in the field and make high-performance models for agent tasks more accessible.

by Synced 2024-09-11 12

AI Machine Learning & Data Science Research

Outperforming Giants: TinyAgent’s Edge-Based Solution Surpasses GPT-4-Turbo

A research team introduces TinyAgent, a framework designed to train and deploy small, task-specific language models capable of performing function calls for agentic systems at the edge, which outperforms larger models such as GPT-4-Turbo in this specific function-calling ability.

by Synced 2024-09-09 382

AI Machine Learning & Data Science Research

Microsoft’s Fully Pipelined Distributed Transformer Processes 16x Sequence Length with Extreme Hardware Efficiency

A Microsoft research team introduces the Fully Pipelined Distributed Transformer, which leverages the multiple memory hierarchies available in modern GPU clusters, enhancing hardware efficiency and cost-effectiveness while achieving exceptionally high Model FLOPs Utilization (MFU).

by Synced 2024-09-04 7

AI Machine Learning & Data Science Research

Samsung’s MobileQuant: Bringing High-Performance Language Models to Your Pocket

A research team from Samsung makes a first attempt to facilitate LLM deployment on edge devices using integer-only quantization. The proposed MobileQuant, is a post-training quantization technique that reduces both inference latency and energy consumption while preserving accuracy comparable to those achieved with 16-bit activations.

by Synced 2024-08-29 4

AI Machine Learning & Data Science Research

NVIDIA’s Minitron: Compressing Llama 3.1 and Mistral NeMo for Superior Performance in 4B and 8B Models

In a new paper LLM Pruning and Distillation in Practice: The Minitron Approach, an NVIDIA research team presents the Minitron compression strategy, which effectively produces a robust 4B model from Llama 3.1 8B and a cutting-edge Mistral-NeMo-Minitron-8B model derived from Mistral NeMo 12B.

by Synced 2024-08-22 4

AI Machine Learning & Data Science Research

Snowflake’s Arctic-TILT: Matching the Power of Models 1,000x Larger in Document Understanding

A Snowflake research team presents Arctic-TILT, a model that is specifically engineered for large-scale, cost-effective deployment while also being adaptable to various domains. It achieves state-of-the-art performance on benchmarks for both business and long documents.

by Synced 2024-08-06 9

AI Machine Learning & Data Science Research

Llama 3: Meta AI’s Multilingual and Multimodal Marvel

In a new paper The Llama 3 Herd of Models, a Meta AI research team presents Llama 3, a new set of foundation models for language, delivering competitive performance comparing to state-of-the-art language models such as GPT-4 on a plethora of tasks.

by Synced 2024-07-31 2

AI Machine Learning & Data Science Research

From YouTube to Keys: Transforming Internet Data into Robotic Musical Talent

In a new paper PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations, a research team introduces PianoMime, a framework for training a robot to play the piano using internet-sourced demonstrations.

by Synced 2024-07-26 3

AI Machine Learning & Data Science Research

From Images to Insights: DeepMind’s Versatile Vision-Language Model PaliGemma Achieves SOTA Results

A DeepMind research team release PaliGemma, a robust and versatile vision language model with 3 billion parameters. PaliGemma excels in transfer learning across various vision and language tasks, achieving state-of-the-art performance in a multitude of open-world applications.

by Synced 2024-07-21 2

AI Machine Learning & Data Science Research

Stanford’s Hypothetical Minds: Revolutionizing Multi-Agent AI with Theory of Mind and Large Language Models

A Stanford University research team proposes Hypothetical Minds, builds on recent advancements in LLM-based agents designed for multi-agent environments, aiming to enhance adaptability in competitive, cooperative, and mixed-motive scenarios with concealed information.

by Synced 2024-07-18 7

AI Machine Learning & Data Science Nature Language Tech Research

Revolutionizing Transformers: DeepMind’s PEER Layer and the Power of a Million Experts

A DeepMind research team introduces PEER, a innovative layer design leverages the product key technique for sparse retrieval from an extensive pool of tiny experts (over a million), which unlocks the potential for further scaling transformer models while maintaining computational efficiency.

by Synced 2024-07-16 2

AI Machine Learning & Data Science Research

Overcoming Computational Challenges in Large Language Model Inference with MInference 1.0

A research team from Microsoft and University of Surrey introduces MInference (Milliontokens Inference), which employs a sparse calculation approach designed to expedite the pre-filling of long-sequence processing. It can reduce inference latency by up to 10 times on an A100 GPU while preserving accuracy.

by Synced 2024-07-12 4

AI Machine Learning & Data Science Research

Mastering Enterprise Chatbots: NVIDIA’s Guide to Building Secure RAG-Based Chatbots with Generative AI

In a new paper FACTS About Building Retrieval Augmented Generation-based Chatbots, an NVIDIA research team introduces the FACTS framework, designed to create robust, secure, and enterprise-grade RAG-based chatbots.

by Synced 2024-07-08 7

AI Machine Learning & Data Science Research

Meta AI Unveils LLM Compiler for Advanced Code and Compiler Optimization

A Meta AI research team introduces Meta Large Language Model Compiler, a suite of robust, openly available, pre-trained models is specifically designed for code optimization tasks, aiming to provide a scalable, cost-effective foundation for further research and development in compiler optimization.

by Synced 2024-07-01 10

AI Machine Learning & Data Science Research

Achieving 8× Performance Gains with Reinforcement Learning on Synthetic Data in Large Language Models

In a new paper RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold, a research team provides insights into how synthetic data affects performance, suggesting that a specific schema can achieve consistent gains over using only positive data, achieving performance by 8× in synthetic data volume.

by Synced 2024-06-28 6

AI Machine Learning & Data Science Research

4.5x Performance Boost: University of Illinois’ Muti-Agent AI System Takes on Cyber Threats

A research team from University of Illinois Urbana-Champaign introduces HPTSA, a multi-agent system that significantly advances cybersecurity exploits, achieving up to 4.5 times better performance on a benchmark of 15 real-world vulnerabilities compared to previous efforts.

by Synced 2024-06-19 6

AI Machine Learning & Data Science Research

Google’s Proofread: AI-Driven Typing Accuracy in One Tap

In a new paper Proofread: Fixes All Errors with One Tap, a Google research team introduces Proofread, an innovative Gboard feature powered by a server-side LLM. This feature allows for seamless sentence and paragraph corrections with a single tap. Launched on Pixel 8 devices, it benefits thousands of users daily.

by Synced 2024-06-17 4

AI Machine Learning & Data Science Research

AI Pioneers Gather at BAAI 2024: Unveiling Innovations in Large-Scaled AI Models for Language, Multimodal, Embodied, Bio-Computing, and FlagOpen 2.0

“Global Vision, Ideas in Collision, Leading Cutting-Edge Innovations” – The 6th annual BAAI Conference successfully concluded on June 15. Over 200 AI scholars and industry leaders gathered to discuss the trajectories and applications of advanced AI technologies.

by Synced 2024-06-15 2

AI Machine Learning & Data Science Research

Stanford & CZ Biohub’s TEXTGRAD: Transforming AI Optimization with Textual Feedback

In a new paper TextGrad: Automatic ‘Differentiation’ via Text, a research team from Stanford University and CZ Biohub introduces TEXTGRAD, a robust framework that performs automatic differentiation through text. In this system, LLMs generate comprehensive, natural language suggestions to optimize variables in computation graphs.

by Synced 2024-06-09 2

AI Machine Learning & Data Science Research

Matrix Multiplication-Free Language Models Maintain Top-Tier Performance at Billion-Parameter Scales

In a new paper Scalable MatMul-free Language Modeling, a research team introduces the first scalable MatMul-free language model, demonstrating that it is possible to completely eliminate MatMul operations from large language models (LLMs) while maintaining robust performance, even at billion-parameter scales.

by Synced 2024-05-29 4

AI Machine Learning & Data Science Research

NVIDIA’s NV-Embed: Superior Performance in Embedding Tasks Without Proprietary Data

In a new paper NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models, an NVIDIA research team introduces NV-Embed. This generalist embedding model significantly boosts the performance of decoder-only LLMs in embedding and retrieval tasks while maintaining simplicity and reproducibility.

by Synced 2024-05-25 3

AI Machine Learning & Data Science Research

Unveiling the Secret Linearity of Transformers: Further Advance Model Efficiency and Performance

In a new paper Your Transformer is Secretly Linear, a research team uncovers a near-perfect linear relationship in transformations between sequential layers and introduces a novel distillation technique that approximates certain layers linearly while preserving model performance.

by Synced 2024-05-23 11

AI Machine Learning & Data Science Research

MedVersa: A Game-Changer Generalist Learner for Versatile Medical Image Interpretation

In a new paper A Generalist Learner for Multifaceted Medical Image Interpretation, a research team proposes MedVersa, a generalist AI model designed to enable flexible learning and tasking for medical image interpretation.