state space model

by Synced 2024-12-14 50

NVIDIA’s Hybrid: Combining Attention and State Space Models for Breakthrough Performance of Small Language Models

An NVIDIA research team proposes Hymba, a family of small language models that blend transformer attention with state space models, which outperforms the Llama-3.2-3B model with a 1.32% higher average accuracy, while reducing cache size by 11.67× and increasing throughput by 3.49×.

by Synced 2023-01-03 4

AI Machine Learning & Data Science Research

Stanford & Buffalo U Advance Language Modelling with State Space Models

In the new paper Hungry Hungry Hippos: Towards Language Modeling with State Space Models, Stanford University and State University of New York at Buffalo researchers explore the expressivity gap between state space models and transformer language model attention mechanisms and propose FlashConv to improve state space model training efficiency on modern hardware.