Today’s 2D generative models have achieved astounding results in high-quality face generation for applications such as facial animation, expression transfer and virtual avatars. The editing capabilities of these models however remain limited, as they cannot disentangle advanced facial attributes like pose, expression and illumination due to a lack of 3D information.
In the new paper Towards Realistic Generative 3D Face Models, a research team from Carnegie Mellon University and Meta proposes AlbedoGAN, a 3D controllable generative model for generating high-resolution textures and capturing high-frequency details in facial geometry that outperforms state-of-the-art baselines in facial shape reconstruction.
The team summarizes their main contributions as follows:
- We introduce AlbedoGAN, a single-pass albedo prediction network that generates high-resolution albedo and decouples illumination using shading maps.
- We show that our model outperforms existing methods in capturing high-frequency facial details in a mesh. Moreover, the proposed method reconstructs 3D faces that recover identity better than SOTA methods.
- We propose a displacement map generator capable of decoding per-vertex displacements directly from StyleGAN’s latent space using detailed normals of the mesh.
- Since our entire architecture can generate 3D faces from StyleGAN2’s latent space, we can perform face editing directly in the 3D domain using the latent codes or text.
The AlbedoGAN pipeline is designed to predict albedo (the ratio of incoming light reflected by facial features) from StyleGAN2’s latent space and to generalize over pose, age and ethnicity. To develop the shape component, a FLAME model is combined with per-vertex displacement maps guided by StyleGAN’s latent space to obtain a higher-resolution mesh.
Augmenting a 2D face generative model with semantic face manipulation in this way enables the resulting generative model to perform detailed editing on 3D rendered faces.
In their empirical study, the team used a PyTorch implementation of StyleGAN2 to perform albedo and image generation; and the official pretrained weights trained on the Flickr-Faces-HQ Dataset (FFHQ) dataset to generate face images from StyleGAN2. They compared AlbedoGAN with SOTA methods DECA and MICA on the NoW benchmark dataset for 3D face reconstruction.
AlbedoGAN outperformed the DECA model in the experiments, recording a 23 percent better median error rate in coarse mesh and a 20 percent better median error rate in detailed mesh generation. AlbedoGAN also substantially improved on a MICA model trained on 3D face scans, producing more detailed meshes and correctly capturing wrinkles, expression, pose and head shape while only training on synthetic images.
This work demonstrates the proposed AlbedoGAN’s ability to faithfully capture intrinsic and high-frequency facial details and outperform SOTA approaches in detailed mesh prediction while preserving identity in reconstructed 3D faces. The team plans to extend this research to incorporate more complex illumination models.
The paper Towards Realistic Generative 3D Face Models is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.