Large-scale language models such as transformers have become the de facto standard for a wide range of natural language processing (NLP) tasks. Despite their apparent linguistic savvy, such sequence models are known to lack a real understanding of the cause and effect of their actions, which can lead to false decisions due to auto-suggestive delusions.
In the new paper Shaking the Foundations: Delusions in Sequence Models for Interaction and Control, a DeepMind research team explores the origin of these mismatches and addresses the problem by treating actions as causal interventions. The team shows that a system can learn to condition or intervene on data by training with the use of factual and counterfactual error signals respectively.
Sequence models are updated based on collected data, and these updates will differ depending on whether the data was generated by the model itself (i.e. actions) or generated outside the model (i.e. observations). The researchers explain that self-delusions occur when a model mistakes its own actions as evidence about the real world and the task at hand, particularly in the presence of confounding factors that obscure the cause-effect relationships between actions and observations.
To mitigate these delusions, the team proposes incorporating a causal independence constraint introduced by the choice of an action without any knowledge of observed variables. Mathematically, this is done by treating the action as a causal intervention, such that intervening on the action does not change the beliefs about the observed variables.
The study explores the use of sequence models for control, including meta-learning, counterfactual teaching, and offline adaptation and control. The researchers summarize the resulting insights as:
- When training a function approximator to learn an adaptive policy, the causal distinction between actions and observations translates into using counterfactual and factual teaching signals respectively. This ensures that predictions are amortized using the correct weighting of past histories that mix conditioning and intervening.
- In general, we cannot meta-train an agent only from expert demonstrations to imitate the expert at deployment time, because said demonstrations could depend on unknown confounding variables. To train an agent using expert demonstrations, it is necessary to induce causal models which propose the necessary confounders for explaining the data.
Overall, this technical report explores how and why self-delusions can occur in sequence models. The researchers propose treating actions as causal interventions to introduce appropriate causal constraints and ensure an agent will only learn about a given task through the effects of its actions; and demonstrate that actions and observations can be regressed as interventions and conditions via counterfactual and factual teaching signals respectively.
The paper Shaking the Foundations: Delusions in Sequence Models for Interaction and Control is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.