AI Research

Everyone Is an Artist: GauGAN Turns Doodles Into Photorealistic Landscapes

At the recent NVIDIA GPU Technology Conference (GTC) 2019, Synced reported on a 'magical brush' app that could transform simple line drawings and sketches into realistic landscapes.

At the recent NVIDIA GPU Technology Conference (GTC) 2019, Synced reported on a ‘magical brush’ app that could transform simple line drawings and sketches into realistic landscapes. GauGAN enables users to not only control the semantic content but also the style of the generated image. NVIDIA has now open-sourced the model behind the stunning images.

image (1)
Changing semantic content
image (2).gif
Changing image styles

NVIDIA’s simple tool allows anyone to build their own “magical brush.” The re-implementation guide on Github includes detailed installation steps covering dataset preparation, training, and inference.

The paper’s authors recommend COCO-Stuff, Cityscapes or ADE20K as the training dataset, and a few sample images from COCO-stuff are included in the code repo for users to experiment with. There is also a pre-trained model available for quick deployment and testing.

Those who want to reproduce the results all by themselves will probably need NVIDIA sponsorship, as the model was trained on an NVIDIA DGX1 machine with 8 V100 GPUs.

image (56).png
Users can control both semantics and style when synthesizing an image

The algorithm behind GauGAN is Semantic Image Synthesis with Spatially-Adaptive Normalization (SPADE), an improved solution for normalization layers.

Common normalization methods such as Batch Normalization learn the Affine layers after the normalization step, and so semantic information from the input tends to be “washed away.” SPADE learns the Affine layer directly from the semantic segmentation map so that the input semantic information can be kept and will act across all layer outputs.

image (57).png
Difference between Batch Norm and SPADE

The paper Semantic Image Synthesis with Spatially-Adaptive Normalization has been accepted by CVPR 2019 for oral presentation.

The pre-trained model can be downloaded from the Google Drive Folder, and the open sourced code is available on GitHub. The SPADE project website is here.


Author: Mos Zhang | Editor: Michael Sarazen

1 comment on “Everyone Is an Artist: GauGAN Turns Doodles Into Photorealistic Landscapes

  1. GauGAN could offer a powerful tool for creating virtual worlds to everyone from architects and urban planners to landscape designers and game developers. With an AI that understands how the real world looks, these professionals could better prototype ideas and make rapid changes to a synthetic scene.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: