In a new paper, a group of researchers from Zhejiang University, The Chinese University of Hong Kong and Cornell University propose an implicit neural representation method called Neural Body. The novel approach tackles dynamic 3D human-body synthesis from a sparse set of camera views, bettering existing methods on key metrics by significant margins.
Typically, 3D reconstruction requires either a large number of cameras to cover all angles or the use of depth sensors, which makes the process complicated, costly, and constrained in certain environments. The researchers approach the novel view synthesis challenge with sparse multi-view video (at most four camera angles) that captures a moving human body. Since these camera angles remain constant, existing reconstruction-based methods tend to produce undesirable heavy rendering artifacts due to the occlusion of body parts at different temporal states. Meanwhile, view synthesis methods like Google’s NeRF (Neural Radiance Fields), which utilize implicit neural representations, also show degraded performance when input views are sparse.
To address these shortcomings, Neural Body generates implicit 3D representations of a human body in different video frames from the same set of latent codes anchored to the vertices of a deformable mesh. For each frame, the model transforms the code locations based on the human pose, while a network regresses the density and colour for any 3D location based on the structured latent codes. This enables images at any viewpoint to be synthesized via volume rendering.
To evaluate their approach, researchers built a multi-view dataset with 9 dynamic human videos captured using a system with 21 synchronized cameras. Four uniformly distributed cameras were selected for training, the rest reserved for testing. On both the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) metrics used for evaluating the novel view synthesis, the proposed model trained on all frames achieved the best performance, outperforming both NeRF and Neural Volume (NV) by a margin of at least 6.45 in terms of the PSNR metric and 0.119 in terms of the SSIM metric.
The researchers also tested their model’s 3D reconstruction capability against a learning-based approach, PIFuHD (Pixel-aligned Implicit Function for high-resolution 3d human digitization). The results show that Neural Body generates accurate geometries for humans in complex motions, while PIFuHD fails to recover correct human shapes with complex poses.
Researchers further compared the proposed method’s synthesis and reconstruction abilities from monocular videos to the People-Snapshot method, where Neural Body achieved more accurate appearance and geometric details, especially with subjects wearing loose clothing.
The paper Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans is on arXiv. The code and dataset will soon be available on the project GitHub.
Analyst: Reina Qi Wan | Editor: Michael Sarazen; Yuan Yuan
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.