Large-scale pretrained transformers learn from corpuses containing oceans of factual knowledge, and are surprisingly good at recalling this knowledge without any fine-tuning. In a new paper, a team from Microsoft Research and Peking University peeps into pretrained transformers, proposing a method to identify the “knowledge neurons” responsible for storing this knowledge and how they can be utilized to edit, update and even erase relational facts.
The researchers summarise their contributions as:
- Introduce the concept of knowledge neurons and propose a knowledge attribution method to identify the neurons that express specific factual knowledge.
- Conduct both qualitative and quantitative analysis to show that knowledge neurons are highly correlated to knowledge expression in pretrained transformers.
- Present a method to explicitly edit (such as update or erase) factual knowledge in transformers, even without any fine-tuning.
The researchers first introduce the components that comprise a transformer block: a multi-head self-attention layer and a feedforward network (FFN) consisting of two feedforward layers. They suggest treating the FFN as a key-value memory bank, where the first layer serves as keys, the second layer serves as values, and each key-value pair forms a memory slot.
The researchers then propose that factual knowledge must be stored in FFN memories and expressed by the corresponding intermediate neurons, which they dub “knowledge neurons.” In the next step, they introduce a knowledge attribution method and refining strategy designed to identify these knowledge neurons.
The knowledge attribution method is based on integrated gradients that evaluate the true contribution of each specific intermediate FFN neuron to the final output. In this way, given a relational fact and a prompt, it is possible to coarsely locate factual knowledge to those neurons that demonstrate attribution scores greater than a given threshold.
To pinpoint the factual knowledge, the team refines their strategy by filtering out “false-positive” knowledge neurons that express information other than factual knowledge.
The researchers conducted experiments on the PARAREL dataset to validate both their hypothesis that factual knowledge is expressed by knowledge neurons and the effectiveness of their proposed knowledge attribution and refining methods. They chose the popular BERT-base-cased as their baseline model.
The results show that suppressing knowledge neurons dramatically decreased the correct probability for the corresponding relational facts by an average of 37.03 percent, while amplifying knowledge neurons increased the correct probability by an average of 46.42 percent.
The researchers also confirmed that knowledge neurons can be used to update knowledge or even erase a category of knowledge stored in pretrained models. In an updating knowledge use case, they directly modified several value slots corresponding to knowledge neurons and discovered they could correct a wrong relational fact remembered by a pretrained model without any fine-tuning. In a second use case designed to erase private or “unethical knowledge,” the team demonstrated that the prediction accuracy for a missing entity significantly decreased after erasing four relations, indicating a substantial portion of the private information had been erased. Moreover, the team says such interventions can edit relational facts without significantly affecting the accuracy of other knowledge in the model.
Overall, the study provides a deeper understanding of transformer architectures, the knowledge stored in pretrained models, and how it is possible to utilize knowledge neurons to explicitly update and/or erase factual knowledge in pretrained transformers.
The paper Knowledge Neurons in Pretrained Transformers is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.