The recent developments in Neural Radiance Field (NeRF) have significantly bolstered the 3D vision domain by field b facilitating high-fidelity 3D face reconstruction and novel view synthesis. However, conventional 3D face manipulation approaches typically necessitate exhaustive manual efforts to amass user-annotated masks of facial features and control control attributes to achieve desired outcomes.
Addressing this challenge, a new paper titled FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields, has been presented by a joint research team from KAIST and Scatter Lab. They introduce FaceCLIPNeRF, an innovative text-driven pipeline that enables high-quality face manipulation utilizing deformable NeRF, eliminating the need for extensive human intervention.
The team summarizes their main contributions as follows:
- Proposal of a text-driven manipulation pipeline of a face reconstructed with NeRF.
- Design of a manipulation network that learns to represent a scene with spatially varying latent codes.
- First to conduct text-driven manipulation of a face reconstructed with NeRF to the best of our knowledge.
The schema of this work is to train a latent code per training frame and a single latent conditional NeRF model shared by all trained latent codes to address scene deformation tasks. And the goal is to train a model that can manipulate a face reconstructed with NeRF driven by a single target text that describes a desired facial expressions for manipulation.
To this end, the team first trains a scene manipulator using HyperNeRF, which can control the deformations of a scene by fixing the scene manipulator parameters and manipulating its latent code. Next, they manage to find a single optimal latent code whose rendered image from the trained scene manipulator that yields highest similarity with a target text, therefore the manipulated images is able to reflect the desired visual attributes of the given target text.
The team further proposes Position-conditional Anchor Compositor (PAC) to grant the FaceCLIPNeRF pipeline freedom to learn appropriate latent codes for different spatial positions, therefore this approach address the linked local attribute problem of conventional approach that cannot compose deformations observed in different instances.
In their empirical study, the researchers compared their approach with three state-of-the-art baselines (i.e. NeRF +FT, Nerfies+I and HyperNeRF+I) on 3D face manipulation task. FaceCLIPNeRF achieves highest cosine face similarity and faithfully reflect the visual attributes of the target text while preserving high visual quality of 3D faces.
The paper FaceCLIPNeRF: Text-driven 3D Face Manipulation using Deformable Neural Radiance Fields on arXiv.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.