AI Nature Language Tech Research

Yann LeCun Team Uses Dictionary Learning To Peek Into Transformers’ Black Boxes

A Yann LeCun team proposes dictionary learning to provide detailed visualizations of transformer representations and insights into semantic structures such as word-level disambiguation, sentence-level pattern formation, and long-range dependency captured by transformers.

Transformer architectures have become the building blocks for many state-of-the-art natural language processing (NLP) models. While transformers are certainly powerful, researchers’ understanding of how they actually work remains limited. This is problematic due to the lack of transparency and the possibility of biases being inherited via training data and algorithms, which could cause models to produce unfair or incorrect predictions.

In the paper Transformer Visualization via Dictionary Learning: Contextualized Embedding as a Linear Superposition of Transformer Factors, a Yann LeCun team from Facebook AI Research, UC Berkeley and New York University leverages dictionary learning techniques to provide detailed visualizations of transformer representations and insights into the semantic structures — such as word-level disambiguation, sentence-level pattern formation, and long-range dependencies — that are captured by transformers.

image.png

Previous attempts to visualize and analyze this “black box” issue in transformers include direct visualization and, more recently, “probing tasks” designed to interpret transformer models. Probing tasks such as parts-of-speech (POS) tagging, named-entity recognition (NER) and syntactic dependency, however, are not sufficiently complex to convince researchers the results accurately reflect the true character and capacity of the studied models. Such probing tasks also fail to reveal the semantic structures of transformers beyond prior knowledge and make it difficult to pinpoint where related semantic representations are learned in transformers.

The researchers propose the use of dictionary learning, a method that can explain, improve, and visualize uncontextualized word embedding representations, to alleviate the limitations of existing transformer interpretation techniques.

The team first introduces a hypothesis regarding their method: that contextualized word embedding can serve as a sparse linear superposition of transformer factors. Previous research has shown that word embeddings can represent elementary semantic meanings. The team approaches the latent representation of words as contextualized word embeddings, and proposes that contextualized word embedding vectors can also be factorized as sparse linear superpositions of a set of elementary elements, which they term “transformer factors.”

They then adopt the convention of using input samples, which trigger the top activation of a feature, to visualize features in deep learning. Because a contextualized word vector is generally affected by many tokens in a sequence, a weight is assigned to each token to identify its relative importance to the largest sparse coefficients of contextualized word vectors.

Finally, they build a single dictionary for all transformer layers to identify low-, mid-, and high-level transformer factors with importance scores (IS), using the IS curves to determine in which layers transformer factors emerge.

image.png

The researchers used a 12-layer pretrained BERT model for their evaluation experiments. They divided semantic meaning into three categories: word-level disambiguation, sentence-level pattern formation, and long-range dependency, generating detailed visualizations for each semantic category.

image.png

For the low-level, word-level disambiguation, transformer factors with early IS curve peaks tended to correspond to specific word-level meanings. For example, In layer 0, the top activated word “left” has different word senses, but this becomes less ambiguous in layer 2. By layer 4, all instances of “left” are seen to correspond to the same sense.

image.png

It is also possible to quantify the disambiguation ability quality of a transformer model to enable for example classifying sentences containing the word “left” annotated as a verb into one set, distinct from sentences containing “left” annotated as other POS.

image.png

For mid-level, sentence-level pattern formation, the proposed method detected patterns in the consecutive usage of adjectives. The results reveal for example a pattern that starts to emerge at layer 4, continues to develop at layer 6, and becomes quite reliable at layer 8. The team concluded that most transformer factors with an IS curve that peaks after layer 6 capture mid-level or high-level semantic meanings.

image.png
image.png

For high level, long-range dependencies, the transformer factors corresponded to linguistic patterns that span a long range in the text, with results showing the top two activated words and their contexts for each such transformer factor. The team observed that this high-level analysis contains more abstract repetitive structures, and could also use mid-level information snippets such as date of birth, first and last name, familial relations, career, etc., to form “the beginning of a biography.”

The researchers believe this simple tool could open up transformer networks, showing the hierarchical semantic representation learned from and at different stages. They have created an interactive website where users can gain additional insights into transformer models by visualizing their latent space.

The paper Transformer Visualization via Dictionary Learning: Contextualized Embedding as a Linear Superposition of Transformer Factors is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

3 comments on “Yann LeCun Team Uses Dictionary Learning To Peek Into Transformers’ Black Boxes

  1. Pingback: [N] Yann LeCun Team Uses Dictionary Learning To Peek Into Transformers’ Black Boxes – ONEO AI

  2. Pingback: Yann LeCun Team Uses Dictionary Learning To Peek Into Transformers’ Black Boxes - AI Summary

  3. good

Leave a Reply to Anonymous Cancel reply

Your email address will not be published.

%d bloggers like this: