South Korea’s Naver Clova AI Research is one of the institutions behind the unsupervised generative network U-GAT-IT. The tech has been attracting attention on the Internet due to the new Tensorflow implementation and anime generation tool ‘Selfie 2 Waifu.’ Now, Clova AI has announced the official PyTorch implementation of another of its popular models — StarGAN v2.
The Clova AI GitHub now hosts StarGAN v2 code and pretrained models, along with a new dataset of animal faces (AFHQ) consisting of 15,000 high-quality images at 512×512 resolution for evaluating methods in a large inter- and intra domain variation setting. The GitHub project has received over 1,100 stars in two days.
First proposed by a trio of Clova AI researchers and a researcher from the Swiss Federal Institute of Technology Lausanne (EPFL), StarGAN v2 is an image-to-image translation framework that learns a mapping between different visual domains and has outperformed other leading methods.
StarGAN v2 addresses two major challenges in image-to-image translation — translating an image from one domain to diverse images in a target domain, and supporting multiple target domains — at the same time.
The researchers evaluated the individual components of StarGAN v2 and compared the model with three leading baselines — MUNIT, DRIT, and MSGAN — on diverse image synthesis. Experiments on both the celebrity faces dataset (CelebA-HQ) and the new AFHQ validate the superiority of StarGAN v2 in terms of visual quality, diversity, and scalability, write the researchers.
The research team also recruited human evaluators through Amazon Mechanical Turk to compare their method with other baseline approaches, with results showing that StarGAN v2 better extracts and renders the styles onto the input image.
The researchers attribute their method’s success to three main factors:
- The style code is separately generated per domain by the multi-head mapping network and style encoder.
- Inspired by NVIDIA’s StyleGAN, the style space is produced by learned transformations.
- The modules benefit from fully exploiting training data from multiple domains.
Journalist: Yuan Yuan | Editor: Michael Sarazen