Transformer-based large language models (LLMs) trained on huge corpora of public data have excelled at capturing and storing factual knowledge. While previous studies have focused on how factual associations are stored in model parameters, the question of how LLMs extract these associations during inference remains relatively underexplored.
In the new paper Dissecting Recall of Factual Associations in Auto-Regressive Language Models, a team from Google DeepMind, Tel Aviv University and Google Research investigates how factual associations are stored and extracted internally in transformer-based language models and provides insights on how such models’ factual predictions are formed.
Synced previously reported on Rank-One Model Editing (ROME, Meng et al., 2022), a technique for editing factual associations in LLMs. This new paper explores the topic through the lens of information flow. Given a subject-relation query, the team investigates where a model predicts the correct attribute and how its internal representations evolve across the layers to generate outputs. They propose that LLMs internally construct attribute-rich subject representations from which their attention heads extract the predicted attribute.
The team focuses on contemporary auto-regressive decoder-only LLMs, whose attribute extraction can be performed by either or both of their multi-head self-attention (MHSA) and MLP sublayers. They apply a “knock out” strategy to specific computational elements by intervening on the MHSA sublayers to block the last position from attending to other positions at specific layers. They then observe the impacts during inference and identify a pair of critical computational points — one from the relation positions followed by another from the subject positions.
To pinpoint locations where attribute extraction occurs, the team analyzes the information that propagates at these critical points and the preceding representation construction process. This is achieved via additional interventions to the MHSA and MLP sublayers and projections to the vocabulary.
The team identifies an internal mechanism for attribute extraction based on a subject enrichment process and an attribute extraction operation. They summarize their findings with regard to the mechanisms of these extraction processes as follows:
- Information about the subject is enriched in the last subject token, across early layers of the model.
- The relation is passed to the last token.
- The last token uses the relation to extract the corresponding attribute from the subject representation, and this is done via attention head parameters.
This work provides a new perspective on how factual associations are stored and extracted internally in LLMs, which the researchers believe can open new research directions for knowledge localization and model editing.
The paper Dissecting Recall of Factual Associations in Auto-Regressive Language Models is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “Google & TAU Explore How Transformer-Based LLMs Extract Knowledge From Their Parameters”