AI Machine Learning & Data Science Research

NVIDIA’s StyleGAN3 Is Fully Equivariant to Translation and Rotation, Improving GAN-Based Animation Generation

A NVIDIA and Aalto University research team presents StyleGAN3, a novel generative adversarial network (GAN) architecture where the exact sub-pixel position of each feature is exclusively inherited from the underlying coarse features, enabling a more natural transformation hierarchy and advancing GAN-based animation generation.

The outstanding resolution and hyperrealistic quality of images produced by generative adversarial networks (GANs) have wowed the general public, and computer vision researchers have increasingly integrated GANs into applications such as domain translation, advanced image editing and video generation.

In the paper Alias-Free Generative Adversarial Networks, researchers from NVIDIA and Finland’s Aalto University observe that current GAN architectures do not synthesize images in a natural hierarchical manner. The team attributes this to careless signal processing, which they address via a thorough overhaul of the signal processing aspects of 2020’s StyleGAN2 to create StyleGAN3, an architecture that exhibits a more natural transformation hierarchy.

In the real world, details of different scale tend to transform hierarchically. For example, when a human head turns, sub-features such as the hair and nose will move correspondingly. The structure of a typical GAN generator is analogous: coarse, low-resolution features are hierarchically refined by upsampling layers, locally mixed by convolutions, and new detail is introduced through nonlinearities.

The researchers note however that despite this superficial similarity, current GAN architectures do not synthesize images in a natural hierarchical manner: the coarse features mainly control the presence of finer features, but not their precise positions. Instead, much of the fine detail appears to be fixed in pixel coordinates — a disturbing phenomenon they term “texture sticking.” In StyleGAN3, the exact sub-pixel position of each feature is exclusively inherited from the underlying coarse features, enabling a more natural transformation hierarchy.

After exploring all signal processing aspects of the StyleGAN2 generator, the team arrived at the surprising conclusion that current upsampling filters are not aggressive enough in suppressing problematic aliasing; and that extremely high-quality filters with over 100dB attenuation are required.

The researchers’ solution to this aliasing problem considers effects in the continuous domain and appropriately low-pass filters the results to produce a strong, rotation equivariant generator. The approach suppresses aliasing, forcing the model to exhibit a more natural hierarchical refinement where the emergent internal representations include coordinate systems that enable details to be correctly attached to the underlying surfaces. In this way, the quality of generated video and animation can be dramatically improved.

The team applied these theoretical ideas by modifying the StyleGAN2 generator to be fully equivariant to translation (T) and rotation (R), resulting in the new alias-free models StyleGAN3-T and StyleGAN3-R.

In their evaluations, the team used StyleGAN2 and their alias-free StyleGAN3-T and StyleGAN3-R generators on six datasets (FFHQ-U, FFHQ, METFACES-U, METFACES, AFHQV2 and BEACHES). The results show that both StyleGAN3-T and StyleGAN3-R remain competitive with StyleGAN2 in terms of the Fréchet Inception Distance (FID) image quality metric, while also demonstrating a very high level of translation equivariance.

The visual flow is remarkable, prompting Google Brain Scientist David Ha to tweet, “These models are getting so good…” EMBED tweet:

The team believes their work can pave the way for new generative models that are better suited to video and animation tasks.

The StyleGAN3 implementation and pretrained models are available on the project’s GitHub. The paper Alias-Free Generative Adversarial Networks is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

2 comments on “NVIDIA’s StyleGAN3 Is Fully Equivariant to Translation and Rotation, Improving GAN-Based Animation Generation

  1. Pingback: r/artificial - [R] NVIDIA’s StyleGAN3 Is Fully Equivariant to Translation and Rotation, Improving GAN-Based Animation Generation - Cyber Bharat

  2. Dave Jones

    Man, if they can get 3D transforms hard baked in, the ramifications for self driving are huge: Really, driving is about predicting the scene on the road advanced a few seconds in time. When that prediction is off, pay attention: the car about to swerve in front of you can be spotted deviating before it leaves its lane, the disappearance of a bush that *should* be there can be animal about to jump out. The lion’s share of that predicting is knowing how everything should transform, something really fundamental to the human brain that has been missing.

Leave a Reply

Your email address will not be published.

%d bloggers like this: