Many anime and cartoon fans use anime profile pictures to represent themselves in a virtual world of social media, games and online forums. Now, such avatars can be personalized. Last month a team of South Korean researchers proposed a method that uses unsupervised image translation to transform a simple selfie into a classic Japanese-style anime face.
The novel “U-GAT-IT” method generates visually superior results compared with previous cutting-edge techniques. The TensorFlow implementation of the paper has been released in a GitHub project.
The creation of generative adversarial networks (GANs) in 2014 laid the foundation for a wide range of image synthesis applications, and one of the most high-profile among them is image translation. Researchers can either use supervised learning to learn a mapping model from paired data samples, or use unsupervised learning to learn a shared latent space and cycle consistency assumptions.
A unsolved challenge in image translation is improving GAN model performance when translating between images that significantly vary in shape — such as from cats to dogs or from selfies to cartoons.
That is addressed by the innovation in this paper: an end-to-end framework that incorporates a new attention module and Adaptive Layer-Instance Normalization (AdaLIN). The attention modules are embedded in both the generator and the discriminator, and identify discriminative image regions in source and target images. Researchers trained an auxiliary classifier on the importance weights of the feature map, which then generated a set of attention maps to guide the model to focus on important regions, such as eyes and mouths.
The AdaLIN, which is applied to the decoder of the generator, can help the model flexibly control degrees of change in shape and texture without adjusting the model architecture or the hyperparameters.
Researchers compared the U-GAT-IT model with CycleGAN, UNIT, MUNIT, and DRIT on five unpaired image datasets — selfie2anime, horse2zebra and photo2vangogh, and cat2dog and photo2portrait. They used different evaluation metrics to assess model performance. In one evaluation, 135 human judges were presented with translated results from different methods and asked to choose their favourites. The U-GAT-IT model significantly surpassed other models on four datasets. The results are shown below.
Researchers also employed quantitate evaluations using Kernel Inception Distance (KID). Lower KID scores suggest more shared visual similarities between real and fake images. Below are the experiment results.
Three of the paper’s authors — Junho Kim, Minjae Kim, and Hyeonwoo Kang — are from NCSoft, the South Korean video game giant best known for its role playing game series Lineage. NCSOFT has been doubling down on AI since 2011 when it launched its AI and Natural Language Processing (NLP) Center. Last year the company introduced an AI system powered by reinforcement learning to battle human professional players in its homegrown game Blade & Soul.
For more information, read the paper U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation on arXiv.
Journalist: Tony Peng | Editor: Michael Sarazen