Computer Vision & Graphics

Fewer Cameras and Faster Processing: USC Proposes Monocular Real-Time Volumetric Performance Capture

A novel volumetric capture system that is capable of fully capturing clothed human bodies in real-time using only a single RGB webcam.

Remote work and online video meetings were already on the rise, but have dramatically increased amid the COVID-19 pandemic. A weak link in these processes however remains the lowly monocular webcam, which pales in performance and realism when compared to expensive multi-view systems.

Now, a team of researchers from the University of Southern California has introduced a novel volumetric capture system that is capable of fully capturing clothed human bodies in real-time using only a single RGB webcam.

Screen Shot 2020-08-24 at 5.09.53 PM.png

The new system reconstructs a fully textured 3D human from each video frame by leveraging Pixel-Aligned Implicit Function (PIFu), a highly effectiveimplicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D objects. The researchers have also proposed an associated hierarchical surface localization algorithm and a direct rendering method that progressively queries 3D locations in a coarse-to-fine manner to extract surfaces from implicit occupancy fields with a minimum number of evaluation points.

In experiments, the novel algorithm and method was shown to successfully accelerate reconstruction by nearly 200 times without compromising quality by culling unnecessary regions for evaluation. The team further proposed a mining technique called Online Hard Example Mining (OHEM) which effectively suppresses failure modes in examples with unusual or challenging poses, view angles, illuminations or clothing types.

Screen Shot 2020-08-24 at 5.11.28 PM.png

The method’s novel progressive surface localization scheme enabled the researchers to reduce the number of points queried during surface reconstruction, delivering a speedup of two orders of magnitude without reducing final surface quality; and also made it possible to directly render novel viewpoints of a captured subject, enabling real-time rendering performance with the reconstructed surfaces. Finally, the OHEM technique makes it feasible to train networks with a tractable amount of data while attaining high-quality results with large appearance and motion variations.

Although the researchers demonstrated their approach using human subjects, the proposed acceleration techniques can easily generalize to any object or topology.

The paper Monocular Real-Time Volumetric Performance Capture is on arXiv.


Analyst: Grace Duan | Editor: Michael Sarazen


Synced Report | A Survey of China’s Artificial Intelligence Solutions in Response to the COVID-19 Pandemic — 87 Case Studies from 700+ AI Vendors

This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon KindleAlong with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.

Click here to find more reports from us.


We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

0 comments on “Fewer Cameras and Faster Processing: USC Proposes Monocular Real-Time Volumetric Performance Capture

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: