The design and content of contemporary marketing campaigns, websites and banners have become increasingly targeted and sophisticated, and compelling image content is crucial for companies striving to stand out from the competition. Human graphic designers can spend a great deal of time iterating back and forth during a laborious content creation process that includes selecting, editing, polishing and compositing raw images to create satisfying and effective marketing content.
In the paper Directional GAN: A Novel Conditioning Strategy for Generative Networks, a research team from Adobe takes a step towards streamlining this process by leveraging generative adversarial networks (GANs). The team’s proposed Directional GAN (DGAN) is a novel and simple approach for generating high-resolution images conditioned on expected semantic attributes.
Because the image content generating process is so complex, it would be ideal if an automatic agent could generate content instantly, giving designers and clients the ability to discard generated images that do not meet their expectations with little or no cost. What’s more, such an agent could more easily produce minor variations to personalize content to different target market segments.
The exponential growth of conditional GANs in recent years has enabled users to stipulate desired attributes and generate entirely new images that are similar in nature to the training data. This generation process however is also complex, as it requires re-training the network with conditional adversarial loss related to the choice of attributes.
The proposed DGAN not only generates image content automatically, but also simplifies conditional generation tasks by handling the generation process independently of the conditioning.
The team summarizes their contributions as:
- Propose an approach employing directional vectors to allow for conditioning in GANs. Show mathematically that using this approach, we can move the latent vector to the desired subspace in a single step.
- Show the applicability of the proposed method not only for single attribute conditioning but also multiple attributes together.
- The approach maintains the same Frechet Inception Distance (FID) score as that of unconditional generation, 23 for Full-body Dataset and 5.06 for CelebA-HQ. Hence it allows for conditioning without deteriorating the quality of generated images.
- The approach is generic enough to be applicable on any GAN with sufficiently disentangled image features in latent space.
The DGAN architecture comprises three components: a GAN which leverages StyleGAN architecture as its generator and discriminator to generate realistic images from random vectors, an image attribute block which identifies the attribute labels in the generated images, and a latent-attribute block which learns separating hyperplanes or regression lines in the latent space.
In the modular training of the DGAN components, the GAN and the image-attribute block are trained independently of each other, while the latent-attribute block requires outputs from the first two, whose parameters are frozen.
In the image generation process, the proposed approach starts with a randomly generated latent vector passed through the latent attribute block. The classifiers/regressors in this block generate the labels corresponding to each attribute. They then move the latent vector in the appropriate direction along the linear combination of directional vectors to obtain the desired outputs.
The team evaluated their method in experiments using the public datasets Multi Pose Virtual Try On (MPV) and the Deep Fashion (DF) landmark detection benchmark to train the DGAN generator and discriminator. They applied DGAN on the CelebA-HQ dataset to generate high-resolution face images conditioned on hair color (black, brown and blonde), gender (female and male), and degree of smile.
The conditional image generation with DGAN achieved over 89 percent accuracy in conditioning on gender, over 78 percent accuracy in conditioning on hair color, and a low root mean square error (RMSE) of 0.134 for degree of smile.
The results show that DGAN can generate high resolution full-body human images and enable conditioning on varied binary, multi-class and continuous valued attributes. DGAN also allows for a great degree of control over attributes in the generation process, which can both accelerate and improve the image creation experience for graphic and content designers.
The paper Directional GAN: A Novel Conditioning Strategy for Generative Networks is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.