“I always knew unsupervised learning was the right thing to do,” declared Turing Award winner Geoffrey Hinton during his recent talk at the AAAI Conference in New York. Fellow Turing honouree Yann LeCun responded from the same stage on the same evening, confirming that after being “skeptical for a long time” he now agrees with Hinton.
The broader AI community is also embracing the shift. A paper published this week by researchers from Russian Internet giant Yandex and Russia’s National Research University Higher School of Economics introduces the first unsupervised learning approach for identifying interpretable semantic directions in the latent space of generative adversarial network (GAN) models.
The leading paradigm for generative modeling in the computer vision domain, GANs comprise a generator and a discriminator which are trained jointly in an adversarial manner. Basically, the generator creates new images, which the discriminator then attempts to distinguish from real images. This back-and-forth eventually optimizes the output quality.
Like many other machine learning models, GANs typically function as black-box instruments that cannot provide a complete explanation of their generation process. Exploring GANs’ latent space structure is therefore an important research challenge as it could help make the generation process more understandable and controllable.
The new paper posits that the latent spaces of typical GAN models often have semantically meaningful directions, and that moving in these directions corresponds to human interpretable image transformations such as zooming or recolouring. The discovery of such semantic directions however is currently performed largely in a supervised manner — requiring labelling by humans, pretrained models, etc., which reduces the range of directions that existing methods can discover. Moreover, existing approaches tend to identify only the directions already expected by researchers.
The new unsupervised approach, tested on MNIST, AnimeFaces dataset, CelebA-HQ dataset, and BigGAN generator, seeks a set of directions corresponding to diverse image transformations and reveals interpretable directions that have never been observed before or would have required an expensive supervised method to identify.
Moving in a random direction typically affects several factors of variations at once, and different directions can also interfere with each other. This can make it difficult to interpret directions or to use them for semantic manipulations in image editing. The new unsupervised method however can easily distinguish one transformation from another and can be universally applied to any pretrained generators.
The researchers also discovered that the revealed directions can be used to generate high-quality synthetic data to solve the challenging problem of weakly supervised saliency detection. In future work they plan to explore whether other interpretable directions can also improve machine learning performance in existing computer vision tasks.
The paper Unsupervised Discovery of Interpretable Directions in the GAN Latent Space is available on arXiv.
Journalist: Yuan Yuan | Editor: Michael Sarazen