AI Computer Vision & Graphics Machine Learning & Data Science Research

Reconstructing Videos In Just 14 Seconds: Meta AI’s Fairy Accelerates Video Synthesis by 44×

A Meta GenAI research team introduces Fairy, a versatile and efficient video-to-video synthesis framework. Fairy stands out for its ability to generate high-quality videos at remarkable speed, producing 120-frame 512×384 videos in just 14 seconds, surpassing previous works by a factor of at least 44×.

The emergence of generative artificial intelligence has ushered in a new era of creative possibilities, marked by the effortless creation and modification of content. However, the challenges inherent in generative image editing, particularly its high variance, pose difficulties in maintaining temporal coherence during video processing on a frame-by-frame basis using an image model.

In a new paper Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis, a Meta GenAI research team introduces Fairy, a versatile and efficient video-to-video synthesis framework. Fairy stands out for its ability to generate high-quality videos at remarkable speed, producing 120-frame 512×384 videos in just 14 seconds, surpassing previous works by a factor of at least 44×.

Fairy primarily focuses on instruction-guided video editing. The objective is to edit an input video with N frames into a new video based on a natural language instruction while preserving the original video’s semantic content. Building upon the baseline of image-based editing model, researchers enhance consistency with a variant of cross-frame attention.

Fairy leverages cross-frame attention to achieve efficacious video-to-video synthesis. The team proposes propagating value features from a set of anchor frames to a candidate frame through cross-frame attention. The resulting cross-frame attention map serves as a similarity metric, refining and propagating feature representations across frames. This minimizes feature disparity, leading to improved temporal consistency in the synthesized videos.

Fairy’s use of cross-frame attention ensures consistency by sharing global features, overcomes memory challenges associated with extensive frame numbers, enhances processing speed through anchor frame feature caching, and streamlines parallel computation, facilitating rapid generation on multiple GPUs.

To validate Fairy’s effectiveness, the research team conducted a large-scale evaluation involving 1000 generated videos. The results demonstrate Fairy’s superior quality compared to prior state-of-the-art methods. Furthermore, Fairy achieves a remarkable >44x speedup over previous methods when utilizing 8-GPU parallel generation.

In summary, Fairy presents a transformative approach to video editing, capitalizing on the strengths of image-editing diffusion models. With a focus on ensuring temporal consistency and superior video synthesis, Fairy establishes itself as a superior solution, capable of producing high-resolution videos at unprecedented speed, solidifying its position at the forefront of quality and efficiency in video synthesis.

The project page is on github.io. The paper Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “Reconstructing Videos In Just 14 Seconds: Meta AI’s Fairy Accelerates Video Synthesis by 44×

  1. Udori1

    Hello friends. I recently found a great mobile application developed by the famous company mostbet: https://mostbetpk.app/ . With this application, you will always have access to your favorite sports and online betting on them, because this application has online betting functions for many different sports! I recommend it, because I use it often!

Leave a Reply

Your email address will not be published. Required fields are marked *