While today’s large language models have demonstrated an impressive ability to extract and recall massive amounts of factual knowledge from their training data, the question of how this factual knowledge is stored in such models remains under-explored.
In the new paper Knowledge Neurons in Pretrained Transformers, a team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and show it is possible to leverage these “knowledge neurons” to edit factual knowledge in transformers without any fine-tuning.
The team summarizes their main contributions as follows:
- We introduce the concept of knowledge neurons and propose a knowledge attribution method to identify the knowledge neurons that express specific factual knowledge in the fill-in-the-blank cloze task.
- We conduct both qualitative and quantitative analyses to show that knowledge neurons are positively correlated to knowledge expression.
- We present preliminary studies of leveraging knowledge neurons to edit factual knowledge in Transformers, even without any fine-tuning.
The team first introduces a knowledge attribution method designed to detect the neurons that represent learned factual knowledge in transformers. The novel method treats transformers’ feed-forward network blocks as key-value memories, and by computing the contribution of each neuron to knowledge prediction, the researchers are able to identify the knowledge neurons.
Given the detected knowledge neurons, the team then demonstrates that suppressing or amplifying their activation will correspondingly affect the strength of a model’s knowledge expression, enabling the editing or erasure of factual knowledge in pretrained transformers via a sort of knowledge surgery that directly modifies the parameters in feed-forward networks and can be performed without any fine-tuning.
In their preliminary case studies, the team evaluated knowledge neurons through the fill-in-the-blank cloze task on the PARAREL dataset and conducted experiments for BERT-base-cased pretrained models.
The results confirm the knowledge neurons identified by the team’s attribution method greatly affect knowledge expression; and that their proposed knowledge surgery achieves an impressive success rate. The team believes knowledge neurons represent a promising and efficient way to modify, update or erase undesired knowledge in pretrained transformers with minimal effort.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.