A research group from the Moscow Institute of Physics and Technology (MIPT) and Russian Internet giant Yandex have proposed a novel image-to-image translation model that uses synthesized input data to enable a “paired” training approach. The model outperforms existing methods in image manipulation and offers researchers a possible solution to the scarcity of paired datasets.
Generative adversarial networks (GAN) are one of the most effective methods for realistic image generation. GANs provide many opportunities for image manipulation and morphing, such as transferring the age or gender of a human face.
Network architecture types most commonly used for transforming a human face are trained on either paired images (same subject, different time) or unpaired data. Paired datasets are better but difficult to obtain — tackling an age manipulation task for example would require photos of the same person taken at different ages and ideally with the same posture and facial expression. The research team proposes meeting this challenge by building a synthetic paired dataset.
The researchers used the SOTA style-transfer-informed StyleGAN2, an unconditional generation method capable of generating realistic images and its latent space to create sets of images differing in particular parameters as a substitute for real-world images. Such substitutes can be used in paired datasets to train a network in particular image manipulation tasks.
The researchers’ process can be summarized as:
- Create synthetic datasets of paired images for human face manipulation tasks such as gender swap, aging/rejuvenation, style transfer and face morphing;
- Show the possibility of training image-to-image networks on such synthetic data and related real world applications;
- Learn the qualitative and quantitative performance of image-to-image networks trained on the synthetic datasets;
- Test the method against existing approaches in gender swap tasks.
The researchers tested their method on various image manipulation tasks, focusing on face data. For evaluation they used a gender transformation task (both directions), where it outperformed StarGAN, MUNIT, and StarGAN v2. The proposed method also owned a distinct advantage in crowdsourced human surveys comparing generated image “quality” (preserving person’s identity, clothes and accessories) and “realism.”
Author: Victor Lu | Editor: Michael Sarazen