New research from UK based AI company and research lab DeepMind is enabling AI agents to perceive dynamic real-world environments more like humans do. The work deals with aligning observed entities across time-steps in both fully observable and partially observable environments and is introduced in the paper AlignNet: Unsupervised Entity Alignment.
While humans interact with the world we draw on our understanding of the objects or entities in the environment — which remains coherent even if an object becomes temporarily occluded. AI agents however have typically been trained using only pixel inputs. Although recently developed unsupervised scene segmentation techniques have enabled object-based inputs, these approaches are limited to single frames, and models cannot keep track of how objects segmented at one time-step correspond (or align) to those at a later time-step.
The researchers note that this alignment issue has impeded progress toward using object representations in downstream tasks.
To address the problem, the researchers propose AlignNet, a model capable of computing correspondence between objects across time — not just from one time-step to the next, but across long sequences.
AlignNet has two key components: a dynamics model that predicts where the objects aligned in the previous time-step should be in the current one, and a permutation model that permutes objects at the current time-step to correspond with the order of the previously aligned objects.
The team incorporated an object-based memory function by creating a humanlike inductive bias for object persistence: when a new object appears it likely continues to exist even if it disappears for some time. This allows the model to not only deal with the appearance and disappearance of new entities but also the reappearance of previously encountered entities after long-term occlusions.
The researchers demonstrated AlignNet’s performance on five tasks spanning three environments: SpriteWorld, Physical Concepts, and Unity Room, a 3D partially observable environment. They also tested the approach in Unity Room and Physical Concepts using a modified version of AlignNet that incorporates memory to deal with partially observable environments.
AlignNet performed very well in experiments in the fully observable environments, both 2D SpriteWorld and 3D Physical Concepts: Continuity. AlignNet also showed an ability to learn to leverage dynamics to resolve ambiguous cases, for example, by using objects’ distinct dynamics to resolve which is which when two objects are of similar shapes and colours.
For tasks in partially observable environments, the researchers augmented AlignNet with a slot-based “Memory AlignNet,” which significantly outperformed baselines in both the Unity Room environment and on the Physical Concepts Free-Form data, successfully dealing with the appearance of new entities and disappearance and reappearance of entities.
The researchers propose that by providing a solution to the alignment problem, AlignNet opens up many new and interesting opportunities for future work with object-based inputs in reinforcement learning and other downstream tasks.
The paper AlignNet: Unsupervised Entity Alignment is on arXiv.
Reporter: Yuan Yuan | Editor: Michael Sarazen
This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.