Site icon Synced

NLPR, SenseTime & NTU Accelerate Automatic Video Portrait Editing

Video Portrait Editing techniques are already finding applications in TV, video and filmmaking — and are expected to play a key role in evolving telepresence scenarios. State-of-the-art methods can already realistically synthesize same-source audio to video. Now, researchers from Beijing’s National Laboratory of Pattern Recognition (NLPR), SenseTime Research, and Nanyang Technological University have taken the tech one step further with a new framework that enables totally arbitrary audio-video translation.

In developing the project, researchers faced a number of challenges:

System architecture overview

To increase the realism of their synthesized videos the researchers combined a number of different models and networks. On the video side, they applied a parametric 3D face model to extract face geometry, pose, and expression parameters from each portrait frame. On the audio side, they used an audio-to-expression translation network to identify specific audio features and match them with facial expressions.

The researchers also designed an audio ID-removing network to lower differentiation on different portraits. The source and target parameters were then modified with restructured 3D facial meshes, creating a masked portrait. Lastly, researchers applied a neural video rendering network to enable clear and uninterrupted background scenes.

Audio-to-expression network architecture

The one-to-many and many-to-one translation test results showed the proposed system’s generalizing ability produced significantly more natural appearance and movements than state-of-the-art methods.

Comparison with four major state-of-the-art methods.

The first author of this paper is Linsen Song, a graduate student under the guidance of NLPR researcher Ran He and former SenseTime intern. A video demonstration and interpretation of the synthesized results can be viewed on the project page.

The associated paper Everybody’s Talkin’: Let Me Talk as You Want is on arXiv.


Author: Reina Qi Wan | Editor: Michael Sarazen

Exit mobile version