“Best GAN samples ever yet? Very impressive ICLR submission! BigGAN improves Inception Scores by >100.”
The above Tweet is from renowned Google DeepMind research scientist Oriol Vinyals. It was retweeted last week by Google Brain researcher and “Father of Generative Adversarial Networks” Ian Goodfellow, and picked up momentum and praise from AI researchers on social media.
All the attention surrounds the paper Large Scale GAN Training for High Fidelity Natural Image Synthesis, which recently popped up on the social network. The paper is an internship project by Andrew Brock from Heriot-Watt University in collaboration with Jeff Donahue and Karen Simonyan from DeepMind. It is under review for next spring’s ICLR 2019.
Figure 1 shows how the model is capable of generating very impressive images with high fidelity and low variety gap. When trained on the ImageNet dataset at 128×128 resolution, BigGAN can achieve an Inception Score (IS) of 166.3, a more than 100 percent improvement over the previous state of the art (SotA) result of 52.52. The Frechet Inception Distance (FID) score has also been improved from 18.65 to 9.6.
The authors proposed a model (BigGAN) with modifications focused on the following three aspects:
- Scalability: As the authors discovered that GANs benefit dramatically from scaling, they introduced two architectural changes to improve scalability (described in detail in the paper’s Appendix B), while at the same time improving conditioning by applying orthogonal regularization to the generator.
- Robustness: The orthogonal regularization applied to the generator makes the model amenable to the “truncation trick” so that fine control of the trade-offs between fidelity and variety is possible by truncating the latent space.
- Stability: The authors discovered and characterized instabilities specific to large-scale GANs, and devised solutions to minimize the instabilities — although these involved a relatively high trade-off on performance.
In addition to its performance boost at 128×128 resolutions, BigGAN also outperformed the previous SotA at 256×256 and 512×512 resolutions on ImageNet. The model was also tested on the larger image dataset JFT-300M to demonstrate its transferability.
Although BigGAN appears to be the new SotA in class-conditional image synthesis, some questions remain regarding “how much distribution did it capture and what would the unconditional version look like?” according to XGBoost, MXNet, and TVM contributor Tianqi Chen.
The paper first appeared on OpenReview, where it was uploaded anonymously. More recently, it was posted on Arxiv and shared on Twitter by the authors. The paper is currently under double-blind review. Posting such papers on public forums or Arxiv is permitted under ICLR/NIPS/ICML conference rules, although submissions that are not properly anonymized are prohibited from consideration by the ACL (Association for Computational Linguistics).
Additional BigGAN generated samples can be downloaded in different resolutions at https://drive.google.com/drive/folders/1lWC6XEPD0LT5KUnPXeve_kWeY-FxH002.
Author: Mos Zhang | Editor: Michael Sarazen