Open-sourced by Nvidia three years ago, StyleGAN has wowed the Internet with its stunning human face synthesis capabilities. More recently, the powerful generative network has also shown its talents in semantic image editing, where it can modify a subject’s age, expression, gender, etc., in high-quality images. Performing such edits on real face images however introduces a number of challenges. The input images often contain out-of-distribution identities, hairstyles, lighting conditions etc., and it is difficult to find the StyleGAN latent variables that will best preserve these characteristics to produce realistic manipulations. Moreover, previous studies have revealed entanglement problems, where modifying one attribute will also affect other facial features.
To address these issues, a research team from Northeastern University and Microsoft has proposed a novel two-branch approach that expands the latent space of StyleGAN to enable identity-preserving and disentangled-attribute editing for real face images. Introduced in the new paper Expanding the Latent Space of StyleGAN for Real Face Editing, the method achieves both qualitative and quantitative improvements over state-of-the-art methods.
The research team summarizes their main contributions as follows:
- Expanding the latent space of StyleGAN by leveraging two-dimensional content features in the synthesis pipeline to achieve low-distortion editing.
- Obtaining disentangled-attribute editing using the sparsity constraint on the style editing directions.
- Achieving local region editing using the alignment loss and feature fusion module by steering the effect of style and content features.
- Outperforming the state-of-the-art methods on semantic editing and reconstruction of face images.
The team’s novel two-branch framework aims to achieve identity-preserving and disentangled-attribute editing for real face images by expanding StyleGAN’s latent space with 2D content features. This provides the model with attribute-aware, image-specific details that can improve real face editing performance.
Given an input face image and target attributes, the proposed framework’s first (style) branch uses sparse manipulation of one-dimensional style features to handle the entanglement issue; while the second (content) branch uses two-dimensional content features to alleviate the distortion issue and enhance the edited image with additional appearance details. This setup enables the framework to effectively synthesize an edited image with target attributes while preserving appearance details such as identity, background, and lighting conditions.
The team compared their approach against multiple state-of-the-art inversion methods on real face editing and reconstruction tasks on the FFHQ and CelebA-HQ face datasets. In the evaluations, the proposed two-branch approach achieved significantly better results on out-of-domain samples and bettered all other methods on perceptual similarity, pixel-wise distance, peak signal to noise ratio and structural similarity metrics.
The study shows that expanding StyleGAN’s latent space with additional 2D content features can effectively enable identity-preserving and attribute-disentangled editing of real face images that results in both qualitative and quantitative performance improvements.
The paper Expanding the Latent Space of StyleGAN for Real Face Editing is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.