Just a few months after Facebook released an on-device model capable of turning common two-dimensional photos into 3D images comes a new and improved model, produced in cooperation with Cornell Tech and Virginia Tech, that enables free-viewpoint rendering of dynamic scenes in a single video.
Typical 3D reconstruction algorithms require multiple cameras to capture different viewpoints, necessitating a complicated hardware setup. Some recent video depth estimation approaches have managed to acquire consistent per-frame depth estimates from a single video using scene depth estimates and view synthesis, but the team says even with perfect depth estimates, these early-stage approaches can lead to unnatural stretches and reveal holes in disoccluded regions.
The Facebook approach enables free-viewpoint rendering by learning a spatiotemporal neural irradiance field — a challenging task given that the input video contains only one viewpoint of the scene at any given moment. For continuous representations of a scene without resolution loss, the researchers used neural implicit representations to aggregate all dynamic scene spatiotemporal aspects into a single global representation.
Rather than modelling view dependency, the researchers trained the neural irradiance fields as a function of both space and time for each video. Depth supervision constrains scene geometry at any moment and disambiguates it from appearance variations, and holes that happen at other time steps are addressed by propagating colour and volume density across time. The resulting representations can render a video from any novel viewpoint and point in time.
For evaluation purposes the researchers used the CVD dataset of short smartphone videos of dynamic scenes with camera calibration and per-frame depth maps; and videos with dynamic scene motion, camera motion, and moving subjects of various sizes from the MPI Sintel dataset. Their approach demonstrated compelling free-viewpoint rendering results compared to baseline methods Mesh, textured mesh representations directly reconstructed from the input depth maps, an inpainted version of Mesh, and a NeRF (neural radiance field) version with an extra time parameter.
The team says the proposed method preserves motion and texture details while conveying “a vivid sense of 3D” on common smartphone videos. Demo videos are available to watch on the project GitHub.
The paper Space-time Neural Irradiance Fields for Free-Viewpoint Video is on arXiv.
Analyst: Reina Qi Wan | Editor: Michael Sarazen; Fangyu Cai
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.