The development of generative adversarial networks (GANs) has greatly advanced 3D-controllable portrait synthesis performance in recent years. Although these GAN models have proven good learners for noise-to-image random face generation, they cannot provide precise control on face manipulation.
In the new paper 3D-FM GAN: Towards 3D-Controllable Face Manipulation, a team from Princeton University and Adobe Research presents 3D-FM GAN, a novel conditional GAN framework that enables precise 3D-controllable face manipulation with high photorealism and strong identity preservation without requiring any manual tuning or optimizations.
The team summarizes their main contributions as follows:
- We propose 3D-FM GAN, a novel conditional GAN framework that is specifically designed for precise, explicit, high-quality, 3D-controllable face manipulation.
- We develop two essential training strategies, reconstruction and disentangled training, to effectively learn our model. We also conduct a comprehensive study of StyleGAN’s latent spaces for structural design, leading to a novel multiplicative co-modulation architecture with a strong identity-editability trade-off.
- Extensive quantitative and qualitative evaluations demonstrate the advantage of our method over prior arts. Moreover, our model also shows strong generalizability to edit artistic faces, which are out of the training domain.
Given a simple input facial image, the 3D-FM GAN framework enables photorealistic disentangled editing on head pose, facial expression and scene illumination attributes while maintaining a strong face identity.
The 3D-FM GAN framework comprises face reconstruction networks which predict the input image’s 3D coefficients and a physically-based renderer that embeds the desired facial expression, lighting and pose manipulations. The original image and the manipulated face rendering are then sent to a StyleGAN conditional generator, which synthesizes the edited face.
The team also designed two essential training strategies — reconstruction and disentangled training — to preserve a strong facial identity and enable 3D editing. They leverage the StyleGAN latent space for their structural design, with the resulting multiplicative co-modulation architecture achieving a favourable identity-editability trade-off.
In their empirical study, the team applied their 3D-FM GAN framework on 5k images from the Flickr-Faces-HQ (FFHQ) facial image dataset to evaluate its identity preservation and editing controllability and the photorealism of its manipulations.
In the experiments, the proposed 3D-FM GAN outperformed prior approaches, demonstrating superior editability, identity preservation and photorealism. The team notes that 3D-FM GAN also achieved better generalizability on large pose editing and out-of-domain artistic images.
The paper 3D-FM GAN: Towards 3D-Controllable Face Manipulation is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.