In the recently published paper Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation, researchers from Penta-AI and Tel-Aviv University introduce a generic image-to-image translation framework dubbed Pixel2Style2Pixel (pSp).
Unlike previous methods that employ dedicated task-specific architectures, the proposed framework is designed to address a wide range of image-to-image tasks using the same architecture — a global approach that avoids possible locality bias. The method shows strong advantages in tasks such as Face Frontalization, where its encoder can be trained in a fully unsupervised manner to align a given face image to a frontal post with a neutral expression.
The researchers noted that while the state-of-the-art image generation method StyleGAN can produce images with phenomenal realism, it also has a disentangled latent space W where meaningful manipulations can be made. As numerous methods that leverage latent space have shown promising image-to-image translation results, it has become a common practice for researchers to encode real images into an extended latent space, W+, for a wide range of applications such as high-resolution synthesis, multi-modal image synthesis, multi-domain image synthesis, conditional image synthesis, etc. However, performing a fast, direct, and accurate learned inversion of real images into W+ remains a challenge.
The team focused on the task of late space embedding, which aims to retrieve a vector that generates a desired, not necessarily known, image. They proposed a novel encoder architecture tasked with encoding an arbitrary image directly into W+. Since the encoder is based on a feature pyramid network, the style feature vectors are extracted from various pyramid scales and inserted directly into a fixed, object retrained StyleGAN generator in correspondence with tech spatial scales. The researchers observed that as a network is trained with an ID similarity loss, it shows better preservation of identity when compared to previous direct approaches.
In experiments, the team demonstrated that their image-to-image translation framework achieves compelling results across various applications. The researchers propose a global approach can further support multi-modal synthesis through resampling of styles. They also suggest that some inherent assumptions will need further investigation. For example, as the proposed method does not utilize locality, preserving fine details of input images such as earrings or background details has become a challenge.
The paper Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation is available on arXiv.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.