The internet loves those little looping action images we call GIFs. They can tell a short visual story in a small file size that has high portability. The visual quality of GIFs is however usually low compared to the videos they were sourced from. If you are sick of fuzzy, low resolution GIFs, then researchers from Stony Brook University, UCLA, and Megvii Research have just the thing for you: “the first learning-based method for enhancing the visual quality of GIFs in the wild.”
Making a GIF from a video usually involves three steps (Figure 1):
- Frame Sampling
- Color Quantization
- Color Dithering
Typically the GIF frame sampling step will degrade video smoothness, resulting in jerky movements. Other problems such as flat color regions, false contours, color shifts and dotted patterns are introduced in the color quantization and color dithering steps.
The researchers propose a new learning-based method to minimize these problems and enhance the visual quality of GIFs, in order to “convert a sequence of GIF frames into a video that has a substantially higher visual quality than itself.” The approach works in two ways (Figure 4):
- Color Dequantization: To remove artifacts caused by color quantization and color dithering, the team developed a novel network – CCDNet (Compositional Color Dequantization Network) which is trained by combining reconstruction loss and generative adversarial loss on both color values and image gradient vectors.
- Frame Interpolation: The team used a modified Super Slomo network for temporal interpolation to increase the temporal resolution of the image sequences.
GIF2Video demo (clockwise from top left) input GIF, ground truth GIF, output GIF, difference between ground truth and output GIFs where the darker sections indicate smaller differences.
Using the normal video-to-GIF conversion method, researchers converted a huge number of video frames into GIF images for training while producing two GIF-Video datasets as a byproduct: a human face centric GIF-Faces dataset, and a more generic GIF-Moments dataset which is built on real GIFs shared by Internet users (FIgure 5). Details of the two datasets can be found in section 5 of the paper.
PSNR (Peak Signal to Noise Ratio) and SIIM (Structural Similarity Index) were the main metrics used for evaluation of the method. PSNR is based on the RMSE (Root Mean Squared Error) between the estimated images and the original images, and SSIM is a perceptual metric that quantifies the image quality. Details of some of the experiment results are shown in Table 1 and Table 2, where we can observe a significant enhancement of the input GIFs’ visual quality and the reduction of quantization artifacts.
The paper GIF2Video: Color Dequantization and Temporal Interpolation of GIF images is on arXiv.
Author: Mos Zhang | Editor: Michael Sarazen