In a new paper, Yoshio Bengio and a team of researchers introduce an end-to-end deep learning model, inspired in part by classical AI production systems, that builds object-centric representations of entities in videos and operates on them with differentiable and learnable production rules.
The objects or entities within any structured visual environment such as video have both visible and latent properties that determine how they interact with each other. The traditional approach to modelling such interactions has been to use equivariant graph neural nets (GNNs). This setup, however, is far from ideal, as GNNs fail to predispose sparse interactions or factorize knowledge about interactions in an entity-conditional manner.
The paper’s proposed Neural Production Systems (NPS) address these issues. NPS comprises a set of rule templates applied by binding placeholder variables in the rules to specific entities, serving to factorize entity-specific and rule-based information in rich visual environments.
The laws of physics ensure that pushing a plate off a dining table will cause the plate to fall to the floor and likely break. Despite never studying basic physics, even a human child can verbalize this knowledge in a propositional expression such as: “If a plate drops from a table, it will break.” This simple expression of propositional knowledge however remains a challenge for deep learning architectures for two main reasons: Propositions are discrete and independent from each another, and propositions must be quantified in the manner of first-order logic.
Classical AI approaches provide valuable perspectives regarding propositional inference on symbolic knowledge representations. A simple example is 1980s production systems, which express knowledge through condition-action rules. The researchers revisited such production systems from a deep learning perspective, proposing a neural production system that naturally integrates perceptual processing and subsequent inference for visual reasoning problems.
The proposed NPS shares four essential properties with traditional production systems: modular, abstract, sparse and symmetric. These specify how knowledge is represented, but not what knowledge is represented. The system architecture also supports the detection and inference of entity representations and the latent rules which govern their interactions.
The researchers conducted experiments to test the effectiveness of NPS. These included an arithmetic task that involved learning addition, subtraction, and multiplication operations on numbers; an MNIST Transformation to test scaling ability to richer visual settings; and an action-conditioned world models simulation of a simple physics world.
In the arithmetic task, NPS had a significantly lower MSE than the baseline. In MNIST transformation, NPS successfully learned to represent each transformation using a separate rule, while the physics environment simulation validated NPS’s ability to extrapolate from simple (few object) environments to more complex environments.
Researchers from Mila, University of Montreal, DeepMind, Waverly and Google Brain joined Bengio on the paper Neural Production Systems, which is available on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.