It’s been several years since Google’s AutoDraw and NVIDIA’s GuaGAN started the AI-generated images from user sketches Internet trend. More recently, OpenAI’s DALL-E shifted this process to text prompts and greatly improved on generated image sophistication and quality. Automatic image synthesis from user scribbles however remains a hot topic in the computer vision community, where researchers have made significant progress in unsupervised sketch-based image synthesis. The main challenge now is alignment: achieving controllable image synthesis that accurately expresses users’ intentions.
In the new paper Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing, a research team from Adobe Research and Australian National University presents paint2pix, a novel model that learns to predict users’ intentions and produce photorealistic images from primitive and coarse brushstroke inputs.
The team summarizes their main contributions as follows:
- We introduce a novel task of photorealistic image synthesis from incomplete and primitive human paintings.
- We propose paint2pix, which learns to predict (and adapt) “what a user wants to ultimately draw” from rudimentary brushstroke inputs.
- We finally demonstrate the efficacy of our approach for (a) progressively synthesizing an output image from scratch, and, (b) performing a diverse range of custom edits directly on real image inputs.
The paint2pix model is a two-step decoupled encoder-decoder architecture comprising a canvas encoding stage and an identity embedding stage. The canvas encoding stage predicts users’ intentions by learning a mapping from incomplete user paintings to their complete realistic renderings and supports modifications in the progressive synthesis trajectory based on coarse and rudimentary brushstrokes. The identity embedding stage preserves the underlying identity between consecutive image predictions to ensure semantic consistency over the whole image synthesis process. These two stages are designed to enable the paint2pix model to perform multi-modal synthesis without requiring special architecture for producing multiple output predictions.
In their empirical study, the team compared paint2pix with existing GAN-inversion methods on image manipulation tasks with user scribbles. In the evaluations, paint2pix achieved impressive performance on both image synthesis from scratch and real image editing tasks, indicating its superior understanding of intent and ability to produce photorealistic images that more closely align with users’ desired results. The team says paint2pix can be used to synthesize desired image outputs from scratch using only rudimentary brushstroke inputs; and for editing, where even users without any artistic expertise can successfully perform various custom image edits.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.