Look at the two pictures below. Can you tell which is a photograph and which was generated by AI?
The truth is… wait for for it… both images are AI-generated fakes, products of American GPU producer NVIDIA’s new work with generative adversarial networks (GANs). The research was published today in the paper A Style-Based Generator Architecture for Generative Adversarial Networks, which proposes a new generator architecture that has achieved state-of-the-art performance in face generation.
Since GANs were introduced in 2014 by Google Researcher Ian Goodfellow, the tech has been widely adopted in image generation and transfer. After some early wiry failures, GANs have made huge breakthroughs and can now produce highly convincing fake images of animals, landscapes, human faces, etc. Researchers know what GANs can do, however a lack of transparency in their inner workings means GAN improvement is still achieved mainly through trial-and-error. This allows only limited control over the synthesized images.
The NVIDIA paper proposes an alternative generator architecture for GAN that draws insights from style transfer techniques. The system can learn and separate different aspects of an image unsupervised; and enables intuitive, scale-specific control of the synthesis.
Here’s how it works: Given an input facial image, the style-based generator can learn its distribution and apply its characteristics on a novel synthesized image. While previous GANs could not control what specific features they wanted to regenerate, the new generator can control the effect of a particular style — for example high-level facial attributes such as pose, identity, shape — without changing any other features. This enables better control of specific features such as eyes and hair styles. Below is a video demo of how GAN-generated images vary from one to another given different inputs and styles.
Behind the new feature is a technique NVIDIA calls “style-mixing.” From the paper: “To further encourage the styles to localize, we employ mixing regularization, where a given percentage of images are generated using two random latent codes instead of one during training. When generating such an image, we simply switch from one latent code to another — an operation we refer to as style mixing— at a randomly selected point in the synthesis network.”
Stochastic variation is another key property allowing GANs to realize the randomization of detailed facial features, such as the placement of facial hair, stubble density, freckles, pores, etc. The paper proposes adding per-pixel noise after each convolution layer. The added noise does not affect the overall composition or the high-level attributes of images, and changing noise in different layers produces matching stochastic variation results.
To quantify interpolation quality and disentanglement, the paper proposes two new, automated methods — perceptual path length and linear separability — that are applicable to any generator architecture.
Researchers saw impressive results using the new generator to forge images of bedrooms, cars, and cats with the Large-scale Scene Understanding (LSUN) dataset.
Alongside today’s paper, NVIDIA has also released a huge new dataset of human faces. Flickr- Faces-HQ (FFHQ) contains 70,000 high-quality images at 1024 resolution. The dataset will soon be available to the public.
The paper’s first author is Tero Karras, a principal research scientist at NVIDIA Research with a primary research interest in deep learning, generative models, and digital content creation. His paper Progressive Growing of GANs for Improved Quality, Stability, and Variation, or known as ProgressiveGAN has won accolades and was accepted by ICLR 2018.
Synced, as a natural fan of deep learning and GAN, has noticed that more than a few papers on GANs have picked up momentum and prompted discussions this year. DeepMind researchers proposed BigGAN two months ago, and the model achieved an Inception Score (IS) of 166.3 — a more than 100 percent improvement over the previous state of the art (SotA) result of 52.52. Meanwhile a team of Tsinghua University and Cardiff University researchers introduced CartoonGAN to simulate the styles of Japanese anime maestri from snapshots of real world scenery.
Click this link to view the new NVIDIA paper.
Journalist: Tony Peng | Editor: Michael Sarazen