Synthesized Paired Data Boosts Facial Manipulation

A research group from the Moscow Institute of Physics and Technology (MIPT) and Russian Internet giant Yandex have proposed a novel image-to-image translation model that uses synthesized input data to enable a “paired” training approach. The model outperforms existing methods in image manipulation and offers researchers a possible solution to the scarcity of paired datasets.

Generative adversarial networks (GAN) are one of the most effective methods for realistic image generation. GANs provide many opportunities for image manipulation and morphing, such as transferring the age or gender of a human face.

Network architecture types most commonly used for transforming a human face are trained on either paired images (same subject, different time) or unpaired data. Paired datasets are better but difficult to obtain — tackling an age manipulation task for example would require photos of the same person taken at different ages and ideally with the same posture and facial expression. The research team proposes meeting this challenge by building a synthetic paired dataset.

The researchers used the SOTA style-transfer-informed StyleGAN2, an unconditional generation method capable of generating realistic images and its latent space to create sets of images differing in particular parameters as a substitute for real-world images. Such substitutes can be used in paired datasets to train a network in particular image manipulation tasks.

*Finding correspondence between latent codes and facial attributes.*

The researchers’ process can be summarized as:

Create synthetic datasets of paired images for human face manipulation tasks such as gender swap, aging/rejuvenation, style transfer and face morphing;
Show the possibility of training image-to-image networks on such synthetic data and related real world applications;
Learn the qualitative and quantitative performance of image-to-image networks trained on the synthetic datasets;
Test the method against existing approaches in gender swap tasks.

*Gender transformation comparison with image-to-image translation approaches. MUNIT and StarGAN v2* are multimodal so one random realization is shown there.*

The researchers tested their method on various image manipulation tasks, focusing on face data. For evaluation they used a gender transformation task (both directions), where it outperformed StarGAN, MUNIT, and StarGAN v2. The proposed method also owned a distinct advantage in crowdsourced human surveys comparing generated image “quality” (preserving person’s identity, clothes and accessories) and “realism.”

Quantitative comparison with other image-to-image transformation baselines, where the Frecht inception distance (FID) metric shows better results with a lower score (left). User study results for all StyleGAN-based approaches (right).

The paper StyleGAN2 Distillation for Feed-forward Image Manipulation is available on arXiv.The project resources are available on Github.

Author: Victor Lu | Editor: Michael Sarazen

Synthesized Paired Data Boosts Facial Manipulation

Like this:

16 comments on “Synthesized Paired Data Boosts Facial Manipulation”

Leave a Reply Cancel reply

Related

Share this:

Like this:

16 comments on “Synthesized Paired Data Boosts Facial Manipulation”

Leave a Reply Cancel reply

Related