AI Machine Learning & Data Science Research

DeepMind’s RecurrentGemma Pioneering Efficiency for Open Small Language Models

A Google DeepMind research team introduce RecurrentGemma, an open language model built on Google's innovative Griffin architecture, which reduces memory usage and facilitates efficient inference on lengthy sequences, thereby unlocking new possibilities for highly efficient small language models in environments where resources are limited.

In the expansive realm of artificial intelligence and natural language processing, Small Language Models (SLMs) are making significant strides. Unlike their larger counterparts with hefty parameter counts and demanding computational needs, SLMs are sleeker versions crafted for optimal performance even in resource-constrained settings.

In a new paper RecurrentGemma: Moving Past Transformers for Efficient Open Language Models, a Google DeepMind research team introduce RecurrentGemma, an open language model built on Google’s innovative Griffin architecture. This model reduces memory usage and facilitates efficient inference on lengthy sequences, thereby unlocking new possibilities for highly efficient small language models in environments where resources are limited.

Griffin, proposed by Google in February 2024, is a hybrid model that achieves rapid inference when generating long sequences by replacing global attention with a blend of local attention and linear recurrences. The researchers introduce just one modification to the Griffin architecture, multiplying the input embeddings by a constant equal to the square root of the model’s width.

The RecurrentGemma architecture moves away from global attention, instead, it models the sequence through a combination of linear recurrences and local attention. The team pre-trains RecurrentGemma-2B on 2 trillion tokens. They begin by training on a diverse mix of large-scale general data before refining training on a smaller, higher quality dataset. For fine-tuning, they adopt a similar strategy to Gemma, incorporating a novel RLHF algorithm to optimize the model for generating responses with high reward.

The evaluation of RecurrentGemma-2B spans various domains, employing a blend of automated benchmarks and human assessments. Notably, RecurrentGemma-2B matches Gemma’s performance while achieving superior throughput during inference, particularly on extended sequences.

The code is available on project’s GitHub. The paper RecurrentGemma: Moving Past Transformers for Efficient Open Language Models is on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

2 comments on “DeepMind’s RecurrentGemma Pioneering Efficiency for Open Small Language Models

  1. liliabarchan

    Our barber shop services pay meticulous attention to every detail, ensuring an unparalleled grooming experience that leaves you looking and feeling your absolute best. Step into our barbershop and immerse yourself in an atmosphere of refined elegance and sophistication, where every detail is meticulously curated to enhance your comfort and relaxation.

  2. Pingback: DeepMind’s RecurrentGemma Pioneering Efficiency for Open Small Language Models -

Leave a Reply

Your email address will not be published. Required fields are marked *