In recent years, much of the research interest in large language models (LLMs) such as OpenAI’s autoregressive GPT has shifted from what these models can do to how they do it. While LLMs have demonstrated impressive prediction consistency with factual knowledge, their computations remain opaque. Knowing how and where such factual associations are stored and retrieved and improving understanding of the mechanisms underlying autoregressive knowledge representations are crucial for further model development and deployment.
In the new paper Locating and Editing Factual Associations in GPT, a research team from MIT CSAIL, Northeastern University and Technion IIT examines how information flows during knowledge recall in large autoregressive transformers and introduces Rank-One Model Editing (ROME), a simple, zero-shot principled model editor capable of locating and editing factual associations in such models.
Knowing how a well-performing language transformer architecture stores its factual associations can help machine learning researchers address errors involving incorrect, biased, or private information by directly editing the factual associations.
The team introduces a novel Causal Tracing method to identify the decisive computations that mediate factual recall. The method isolates the causal effects of individual states in the neural network while processing a factual statement. By tracing this information flow, it is possible to identify the modules that principally contribute to factual association retrieval.
The proposed ROME is designed for editing individual facts within a GPT model. ROME treats a single module as a key-value store in which the key encodes a subject, and the value encodes the corresponding knowledge of this subject. The model can thus recall factual associations by retrieving the value corresponding to the key, enabling the associations of individual facts to be edited and updated in both specific and generalized ways.
The team evaluated ROME on the Zero-Shot Relation Extraction (zsRE) task and on their own CounterFact dataset, which includes thousands of counterfactuals and text that allows quantitative testing of specificity and generalization when learning a counterfactual. In the evaluations, ROME showed competitive results on zsRE and maintained both specificity and generalization on the CounterFact dataset.
Overall, this work pinpoints the crucial role of mid-layer feedforward modules in storing factual associations, reveals the information flow of knowledge recall in autoregressive transformers, and demonstrates the capability of editing factual associations in such LLMs.
The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/. The paper Locating and Editing Factual Associations in GPT is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.