AI Research

NVIDIA & MIT CSAIL Open-Source Video-to-Video Synthesis Method

Nvidia and the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) have open-sourced their video-to-video synthesis model. By using a generative adversarial learning framework, the method can generate high-resolution, photorealistic and temporally coherent results with various input formats, including segmentation masks, sketches, and poses.

Nvidia and the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) have open-sourced their video-to-video synthesis model. By using a generative adversarial learning framework, the method can generate high-resolution, photorealistic and temporally coherent results with various input formats, including segmentation masks, sketches, and poses.

Compared to image-to-image translation, there has been less research into video-to-video synthesis. To solve the problem of low visual quality and incoherency of video results in existing image synthesis approaches, the research group proposes a novel video-to-video synthesis approach capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long.

The team’s 2017 Image Synthesis and Semantic Manipulation research

image (69).png
Sketch-to-face video results
image (70).png
Pose-to-dance video results

The authors performed extensive experimental validation on various datasets and the model showed better results than existing approaches from both quantitative and qualitative perspectives. In addition, when the team extended the method to multimodal video synthesis with identical input data, the model produced new visual properties in the scene, with high resolution and coherency.

image (71).png
Multi-modal video synthesis results. (These synthesized videos contain different road surfaces.)

Researchers suggest the model may be improved in the future by adding additional 3D cues such as depth maps to better synthesize turning cars; using object tracking to ensure an object maintains its colour and appearance throughout the video; and training with coarser semantic labels to solve issues in semantic manipulation.

The Video-to-Video Synthesis paper is on arVix, the team’s model and data are here.


Author: Victor Lu | Editor: Michael Sarazen

0 comments on “NVIDIA & MIT CSAIL Open-Source Video-to-Video Synthesis Method

Leave a Reply

Your email address will not be published. Required fields are marked *