AI Machine Learning & Data Science Research

Open Sparse Autoencoders Everywhere: The Ambitious Vision of DeepMind’s Gemma Scope

In a new paper Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2, a Google DeepMind research team introduces Gemma Scope, a comprehensive suite of JumpReLU SAEs.

Sparse autoencoders (SAEs) are an unsupervised learning technique designed to decompose a neural network’s latent representations into sparse, seemingly interpretable features. While these models have generated significant interest for their potential, their research applications have been largely confined to industry due to the prohibitive cost of training a full suite of SAEs.

In a new paper Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2, a Google DeepMind research team introduces Gemma Scope, a comprehensive suite of JumpReLU SAEs. This suite has been trained on all layers and sub-layers of the Gemma 2 2B and 9B models, as well as select layers of the Gemma 2 27B base models.

The creation of Gemma Scope was a formidable engineering task. The main release includes over 400 sparse autoencoders, collectively containing more than 30 million learned features, each trained on 4-16 billion tokens of text. The project consumed over 20% of the computational resources required for training GPT-3, involved saving approximately 20 Pebibytes (PiB) of activations to disk, and resulted in hundreds of billions of sparse autoencoder parameters.

The focus on JumpReLU SAEs was deliberate, as they offer a slight Pareto improvement over other methods and allow for a variable number of active latents across different tokens, unlike TopK SAEs.

The researchers trained these SAEs on the activations of Gemma 2 models using text data from the same distribution as the pretraining data for Gemma 1, with the exception of one suite of SAEs trained on an instruction-tuned (IT) model. To ensure consistency across layers and sites, activation vectors were normalized by a fixed scalar to maintain a unit mean squared norm. This normalization helps in reliably transferring hyperparameters between layers, as the raw activation norms can vary significantly, affecting the scale of the reconstruction loss.

In addition to releasing the SAE weights, the team has also provided performance metrics for each SAE based on standard benchmarks. They hope that by making these resources publicly available, they can facilitate more ambitious research in the areas of safety and interpretability within the community.

An interactive demo can be found at https://neuronpedia.org/gemma-scope. The paper Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 is on arXiv.


Author: Hecate He | Editor: Chain Zhang

1 comment on “Open Sparse Autoencoders Everywhere: The Ambitious Vision of DeepMind’s Gemma Scope

  1. Forest Hills Audiology is the place to go if you’re looking for top-notch hearing doctors. The staff was warm and welcoming, making the entire process stress-free. The audiologist I saw was extremely knowledgeable and helped me find the perfect hearing aids that have made a huge difference in my daily life. Highly recommended!

Leave a Reply

Your email address will not be published. Required fields are marked *