When a human acquires the knowledge that “Olaf Scholz was the ninth Chancellor of Germany,” they can effortlessly respond to the question, “Who was the ninth Chancellor of Germany?” This seemingly simple act of generalization is a fundamental aspect of human cognition, often taken for granted.
However, in a new paper titled “The Reversal Curse: LLMs trained on ‘A is B’ fail to learn ‘B is A'” authored by a collaborative research team from Vanderbilt University, the UK Frontier AI Taskforce, Apollo Research, New York University, the University of Sussex, and the University of Oxford, has unveiled a remarkable shortcoming in auto-regressive large language models (LLMs).
This intriguing phenomenon, dubbed the “Reversal Curse,” revolves around the limitations of language models when trained on sentences structured as “A is B.” Surprisingly, these models do not automatically generalize to the inverse formulation, “B is A.” The team’s discovery challenges the conventional wisdom about the capabilities of advanced language models.
To illustrate the Reversal Curse, consider a model that is trained on sentences structured as “A is B,” where A represents a name and B represents a description. Astonishingly, this model fails to predict the reverse direction, “B is A.” Specifically, if the LLM is conditioned on a description, it does not exhibit a higher likelihood for generating the corresponding name than a random baseline.
The research team substantiates their findings through a series of fine-tuning experiments conducted on synthetic data. They fine-tune a base LLM using fictitious facts in the form of “A is B” and demonstrate that the model struggles to produce the name when presented with the description, employing a variety of different prompts.
Moreover, the researchers provide tentative evidence suggesting that the Reversal Curse may impact practical generalization in state-of-the-art models. They evaluate GPT-4’s performance on pairs of questions like “Who is Tom Cruise’s mother?” and “Who is Mary Lee Pfeiffer’s son?” for 1000 different celebrities and their actual parents. The outcomes reveal instances where the model correctly answers the first question (“Who is [name]’s parent?”) but falters with the second question.
In terms of exact-match evaluation, GPT-3-175B performs admirably when the order aligns with the training data. However, when confronted with questions whose order diverges from the training data, the model struggles to generalize, with accuracy plummeting close to 0%.
In a real-world knowledge experiment, GPT-4 exhibits a fascinating twist regarding the Reversal Curse. It can correctly identify Mary Lee Pfeiffer as Tom Cruise’s mother but falters when identifying Tom Cruise as Mary Lee Pfeiffer’s son. This incongruity further underscores the enigmatic nature of this phenomenon.
The implications of this work are profound and raise several intriguing questions. Why do language models suffer from the Reversal Curse? Are non-autoregressive models susceptible to it as well? Could humans exhibit some form of the Reversal Curse in their cognitive processes? The research team hopes that future endeavors will shed more light on these enigmas, unraveling the intricacies of language model generalization and offering new insights into the boundaries of artificial and human intelligence.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.