The shortest distance between point A and point B may be measured as a straight line, but in the real, obstruction-filled world, the routes that humans choose are actually determined by the spatial layout of the objects in a given environment — aka scene context. A team of researchers from UC Berkeley, Nanjing University and Facebook Reality Lab have proposed a novel three-stage learning framework that includes scene context to generate long-term 3D human motion prediction when given a single scene image and 2D pose histories.
In the paper Long-term Human Motion Prediction with Scene Context, the team explains that although there have been significant advancements in human motion prediction thanks to deep learning, scene context has been largely ignored by existing frameworks. The team says such limitations mean motion predictions “tend to be short-term (around 1 second), and local in space, e.g., walking in the same spot without global movement.”
The proposed learning framework divides human motion prediction tasks into three stages:
- GoalNet predicts 2D motion destinations of the human based on reference images and 2D pose heatmaps
- PathNet plans the 3D global path of the human with the input of 2D heatmaps, 2D destinations, and the image
- PoseNet predicts 3D global human motion, i.e., the 3D human pose sequences, following the predicted path
Unlike existing human motion datasets that have a relatively small number of frames, 3D scenes, characters and noisy 3D annotations, the large-scale synthetic dataset used in the study has clean 3D annotations. The team also collected a total of one million HD resolution RGB-D frames from the popular role-playing video game Grand Theft Auto. The team says pretraining on their dataset stabilizes training and improves prediction performance on real datasets.
In qualitative and quantitative evaluations with both synthetic and real datasets, the novel learning framework shows consistent improvements over existing methods.
It’s believed this motion prediction research can be relevant to many real-world applications where environment context is critical — such as with home or industrial service robots or where AR glasses aid navigation for vision-impaired people, etc.
The paper Long-term Human Motion Prediction with Scene Context is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.