Understanding and generalization beyond the training distribution are regarded as huge challenges in modern machine learning (ML) — and Yoshua Bengio argues it’s time to look at causal learning for possible solutions. In the paper Towards Causal Representation Learning, Turing Award honoree Bengio and his research team make an effort to unite causality and ML research approaches, delineate some implications of causality for ML, and propose critical areas for future research.
Bengio outlined the challenge in a causal representation learning talk he gave late last year, “I would say there are pretty significant gaps between current state-of-the-art in machine learning-driven AI and the intelligence that we see deployed in humans and many animals… We don’t have AI systems that actually understand at the level that humans do, or anywhere close.” Bengio characterized the meaning of human-level AI “understanding” as: capture causality; capture how the world works; understand abstract actions and how to use them to control; reason and plan, even in novel scenarios; explain what happened (inference, credit assignment); and generate out-of-distribution.
In this regard, most modern ML models remain far from true understanding, as they work only under fixed experimental conditions and interventions in the real world are seen as a nuisance that can hopefully be engineered away. It is therefore not surprising that most of today’s ML models lack an out-of-distribution generalization ability.
Causal learning, on the other hand, focuses on representing structural knowledge about the data-generating process to allow interventions and changes, making it easier to re-use and re-purpose learned knowledge. This approach is considered closer to human thinking.
The new paper reviews and synthesizes a number of important contributions to causal learning, specifically:
- Describing different levels of modelling in physical systems and presenting the differences between causal and statistical models.
- Expanding on the Independent Causal Mechanisms (ICM) principle as a key component that enables the estimation of causal relations from data.
- Reviewing existing approaches to learn causal relations from appropriate descriptors (or features).
- Discussing how useful models of reality may be learned from data in the form of causal representations, and discussing several current ML problems from a causal point of view.
- Assaying the implications of causality for practical machine learning, discussing examples at the intersection between causality and ML in scientific applications and speculating on the advantages of combining the strengths of both fields to build a more versatile AI.
The paper classifies models on three levels: mechanistic or physical models, casual models and statistical models. The most detailed are mechanistic or physical models, which are usually differential equations that provide comprehensive system descriptions. Unlike differential equations, which typically require input from human experts, causal modelling is in a more data-driven approach, replacing expert knowledge with weak and generic assumptions. The most superficial of the model types are statistical, which do not use dynamic processes and only make predictions based on some variables under fixed experimental conditions.
A strong ML model depends on an independent and identically distributed random variables (i.i.d.) data assumption. In other words, the conceptual basis of statistical learning is a joint distribution. Causal learning, meanwhile, allows inference on data with interventions (no need to assume that data are i.i.d.) and can provide understanding and predict the effect of interventions. This differs from statistical models, which only allow inference on i.i.d. experiments.
The researchers propose that insights on the differences between statistical and causal models can be expressed as the Independent Causal Mechanisms (ICM) Principle (proposed by B. Schölkopf in 2012), where the causal generative process of a system’s variables is composed of autonomous modules that do not inform or influence each other. In the probabilistic case, this means that the conditional distribution of each variable given its causes (i.e., its mechanism) does not inform or influence the other mechanisms.
The researchers expand the ICM principle and describe the sparse mechanism shift hypothesis as Sparse Mechanism Shift (SMS), where small distribution changes tend to manifest themselves in a sparse or local way in the causal/disentangled factorization, i.e., they should usually not affect all factors simultaneously.
The researchers also look at causal representation learning’s connection to the recent interest in the concept of disentangled representations in deep learning, and discuss how ML models can benefit from causal learning with regard to semi-supervised learning, domain generalization, and adversarial robustness.
Finally, the team proposes a number of critical areas for future research: learning non-linear causal relations at scale; learning causal variables; understanding of bias in existing deep learning approaches; and learning causally correct models of the world and the agent.
The authors are from Max-Planck Institute for Intelligent Systems, ETH Zurich, Google Research Amsterdam, Mila and the University of Montreal. The paper Towards Causal Representation Learning is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.