AI-powered facial image generations and manipulations have flooded the Internet in recent years, a product of the ever-increasing power of generative adversarial networks (GANs). While applications such as face-aging, sentiment-editing and style-transfer can provide a bit of harmless fun for users, advanced image generation techniques have also been maliciously employed to create deepfake news reports, face-swap pornography, etc. The outputs of today’s state-of-the-art image generation models are so realistic that they can fool most humans, and some are now also fooling state-of-art deepfake detection models, especially when the images are produced via unknown manipulation techniques.
A research team from the University of Tokyo addresses the pressing challenge of deepfake detection in their new paper Detecting Deepfakes with Self-Blended Images, proposing self-blended images (SBIs), a novel synthetic training data approach that outperforms state-of-the-art methods on unseen manipulations and scenes for deepfake detection tasks.

The researchers’ goal is to detect statistical inconsistencies between transferred faces and background image information in deepfakes. Their SBI approach is based on the premise that more general and hardly recognizable fake samples will encourage classifiers to learn more generic and robust representations. SBI blends pseudo source and target images from a single image to generate synthetic fake samples that include difficult-to-detect common forgery traces. These samples can then be used to train detectors that exhibit better robustness and generalization performance.

The SBI pipeline comprises three main steps: 1) A source-target generator first generates pseudo source and target images which will later be used for blending, 2) A mask generator then generates a gray-scale deform mask image, and 3) The source and target images are blended with the mask to obtain an SBI.
In their empirical study, the team compared their SBI approach with frame-level state-of-the-art detection methods DSPFWA, Face X-ray, Local relation learning (LRL), Fusion + RSA + DCMA + Multi-scale (FRDM), and Pair-wise self-consistency learning (PCL) on the FF++, CDF, DFD, DFDC, DFDCP, and FFIW datasets. They also evaluated their model against video-level baselines such as Discriminative attention models (DAM) and Fully temporal convolution networks (FTCN).

In the experiments, the SBI approach surpassed the baselines by 4.90 percent and 11.78 percent in cross-dataset evaluation on the DFDC and DFDCP datasets, respectively. Overall, the study shows that the SBI synthetic training data scheme outperforms state-of-the-art methods on unseen deepfake manipulations and scenes; and can generalize across all network architectures and training datasets without significant performance drops.
The code is available on the project’s GitHub. The paper Detecting Deepfakes with Self-Blended Images is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
0 comments on “UTokyo’s Novel Self-Blended Images Approach Achieves SOTA Results in Deepfake Detection”