Human re-rendering has practical applications in animation character creation, 3D video, virtual reality, augmented reality and more. While techniques that leverage camera arrays and point clouds can deliver accurate results, designing algorithms that can realistically render clothed humans in different poses and viewpoints from a single image remains challenging.
In a new paper, researchers from Max Planck Institute for Informatics and Facebook Reality Labs propose an end-to-end trainable method that enables re-rendering of humans from one single image and can synthesize its subjects in different user-defined poses with clothing transferred from other reference images.
The novel approach combines monocular parametric 3D body modelling, a learned detail-preserving neural-feature based body appearance representation and a neural network based image-synthesis network.
Taking a single image of a dressed human as input, the first step in the pipeline uses DensePose to predict dense correspondences between the input image and a Skinned Multi-Person Linear (SMPL) model. DensePose is an estimation that aims to map all human pixels in an RGB image to the 3D surface of the human body. This enables the extraction of a UV texture map of visible regions for 3D modelling. Representing body pose and shape with SMPL parametric human surface models ensures that the synthesized output images can be easily reposed to the target pose.
The second step deploys a U-Net based network dubbed FeatureNet to ensure the full UV feature map contains a d-dimensional feature representation for both visible and occluded regions in the source image. The third step targets a specific pose as input, rendering the full UV feature map to a d-dimensional feature image that matches the target pose. Finally, a RenderNet generator network based on the Pix2PixHD model generates a realistically rendered reposed image.
The researchers tested their approach in experiments on the In-shop Clothes Retrieval Benchmark of DeepFashion dataset, which contains 52,712 images of fashion models in various poses with 13,029 different clothing items. Compared to SOTA methods Coordinate Based Inpainting (CBI), Deformable GAN (DSC), Variational U-Net (VUnet) and Dense Pose Transfer (DPT), the proposed method delivered higher realism and better preserved identity and garment details.
Quantitative results on the recently introduced Learned Perceptual Image Patch Similarity (LPIPS) metric saw the proposed approach significantly outperforming existing methods, while it performed comparably on the Structural Similarity Index (SSIM) metric. The team notes that the proposed method can also generate realistic renderings for a video sequence that includes garment and motion transfer from a single source image, despite not being specifically trained to generate videos.
The paper Neural Re-Rendering of Humans from a Single Image is on arXiv.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.