In a new paper, a trio of Columbia University researchers propose a novel framework and hierarchical predictive model that learns to identify what is predictable from unlabelled video.
Whether kicking a ball or driving in traffic, humans are constantly making predictions about our environments. We do this guided by a variety of factors, and, as we all know, some things are much easier to predict than others.
Consider the moment with the hovering hands in the above short video. We might wonder what will come next: will the pair shake hands, or will they high-five? The researchers propose that rather than predicting the exact next action, a model could instead “hedge the bet” and predict with higher confidence that the pair will at very least greet each other.
The paper Learning the Predictability of the Future introduces a hierarchical predictive model for learning what is predictable from unlabelled video. Inspired by the observation that people often organize actions hierarchically, the researchers designed the approach to jointly learn a hierarchy of actions from unlabelled video while also learning to anticipate them at the right level of abstraction. The model thus will predict a future action at the concrete level of the hierarchy when it is confident, and, when it lacks confidence, will select a higher level of abstraction to improve confidence.
The team say they designed their predictive model in hyperbolic space based on another critical observation, that hyperbolic geometry naturally and compactly encodes hierarchical structures. “Unlike Euclidean geometry, hyperbolic space can be viewed as the continuous analog of a tree because tree-like graphs can be embedded in finite-dimension with minimal distortion,” they explain.
Leveraging hyperbolic embeddings for the prediction also takes advantage of the hierarchical nature of visual data, as the hyperbolic space is naturally suited for hierarchies. The researchers say hyperbolic predictive models can also smoothly interpolate between forecasting video abstractions and concrete representations depending on the level of predictability.
Experiments on the established FineGym and Hollywood2 video datasets demonstrated that although the representations are trained with unlabelled video, action hierarchies automatically emerge; and that predictive hyperbolic representations can both recognize actions from partial observations and forecast them better than baselines.
The paper Learning the Predictability of the Future is on arXiv. The code and model are available on the project GitHub.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.