TikTokers and VTubers are sure to be delighted by a new AI-powered image synthesis framework that makes “learning” to moonwalk or drop Blackpink dance moves a snap.
“Impersonator++” basically copies human movements from reference videos and pastes them onto source images. Proposed by researchers from ShanghaiTech University, Chinese Academy of Sciences and University of Chinese Academy of Sciences, it tackles human motion imitation, appearance transfer and novel view synthesis within a unified framework.
Motion imitation, appearance transfer and novel view synthesis all fall under the umbrella of human image synthesis, the generation of believable and photorealistic images of humans. The field has applications in areas such as character animation, reenactment, virtual clothes try-on, movie and game making, and so on.
Existing task-specific methods use 2D keypoints to estimate human body structure and express position information, but cannot effectively characterize different subjects’ body shapes or limb rotations. The researchers propose using a 3D body mesh recovery module that can disentangle pose and shape and model joint location and rotation while also better characterizing “personalized body shape.”
To more accurately preserve source information such as texture, style, colour and face identity, the researchers designed an Attentional Liquid Warping GAN with Attentional Liquid Warping Block (AttLWB) that propagates the source information in both image and feature spaces to a synthesized reference. AttLWB uses a denoising convolutional auto-encoder to help extract useful features and better characterize the source identity.
The researchers say their method can also support more flexible warping from multiple sources. It firstly trains a model on an extensive training set and then fine-tunes using one or few-shot learning with unseen images in a self-supervised way to generate high-resolution results. The team also applied one/few-shot adversarial learning to further improve the generalization ability of unseen source images.
To evaluate Impersonator++ performance, the researchers built an Impersonator (iPER) video dataset featuring diverse styles of clothing. The dataset contains 206 video sequences with 241,564 frames, and covers 30 human subjects with different shape, height and gender conditions wearing over 100 clothing items and performing random actions.
The researchers used the iPER, MotionSynthetic, FashionVideo, and YouTube-Dancer-18 datasets to evaluate personalization, loss functions, input concatenation, texture warping and feature warping etc. under one-shot and few-shot settings, and to perform qualitative comparisons.
The results show that the proposed method produces high-fidelity images that preserve face identity, shape consistency and clothes details of the source, while 2D pose-guided methods like pG2, DSC, SHUP and DIAF struggle to do so. The method also achieves decent results in cross imitation, even with reference images out of the domain of its training dataset.
The paper Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis is on arXiv.
Reporter: Yuan Yuan | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.