Facebook and Kaggle are facing an online backlash after the apparent winners of the Deepfake Detection Challenge (DFDC) were disqualified. Facebook launched the competition last year to encourage the development of new technologies to detect deepfakes and manipulated media, and there were more than 2,000 entries were submitted. The data science and machine learning community site Kaggle hosted the DFDC challenge and leaderboard.
Two days ago, Facebook posted the US$1 million challenge’s winning teams as follows:
1st: Selim Seferbekov (with average precision of 65.18 percent)
4th: Eighteen years old
5th: The Medics
Facebook however noted in their announcement: “You may notice that the top rankings have changed. Unfortunately, the top two teams in the preliminary standings used external data sources in their winning submissions that were not allowed under the rules of this competition.”
The rules of the competition do allow teams to use data other than the official competition data to develop and test models and submissions. Teams however must “(i) ensure the External Data is available to use by all participants of the competition for purposes of the competition at no cost to the other participants and (ii) post such access to the External Data for the participants to the official competition forum prior to the Entry Deadline.”
The previously top-ranked team, All Faces Are Real, manually created a face image dataset from YouTube videos with CC-BY license, which explicitly allows for commercial use, and the Flickr-Faces-HQ Dataset. They did not take kindly to being disqualified: “Facebook felt some of our external data ‘clearly appears to infringe third party rights’ despite being labelled as CC-BY (it’s not clear what data they were referring to specifically).” The team argued that they “did not knowingly seek to undermine any rule,” and asked “why Kaggle never took the opportunity to clarify that external data must additionally follow the more restrictive rules for winning submission documentation.”
All Faces Are Real continued, “In our discussions with Facebook and Kaggle, we were told that despite fulfilling this (the previous mentioned rule regarding External Data) we were contravening the rules on Winning Submission Documentation.”
In response to Facebook’s decision to disqualify them, the team voiced disagreement and disappointment, “Specifically, we were asked to provide additional permissions or licenses from individuals appearing in [our] external dataset. Unfortunately, since the data was from public datasets, we didn’t have specific written permission from each individual appearing in them, nor did we have any way of identifying these individuals.”
The competition asked participants to submit their code and be tested against a black box data set with challenging and unshared real-world examples. Facebook determined the winners by evaluating participant models against the black box dataset, using the log-loss score against the private test set held outside the Kaggle platform, which “contains videos with a similar format and nature as the Training and Public Validation/Test Sets, but are real, organic videos with and without deepfakes.”
Facebook explains the challenge here is to generalize from unknown examples to unfamiliar instances. Since the separate black box data set consists of 10,000 videos not available to entrants, participants “had to design models that could be effective even under unforeseen circumstances.”
Many participants had flagged the ambiguity of the rules regarding the use of external data. “As external data might be very helpful to obtain better scores on public/private leaderboards how are you going to validate solutions’ compliance with the rules?” asked DFDC winner Selim Seferbekov. Another question from Seferbekov regarding the use of YouTube videos remained unanswered for three months.
To many machine learning practitioners, Facebook’s disqualification of the winning submission did not seem fair. “This is truly awful. @Facebook has bullied @Kaggle into ‘semi-disqualifying’ the legitimate winners of the $1M Deep Fakes competition.” NVIDIA Data Scientist Bojan Tunguz tweeted.
Kaggle Grandmaster Gábor Fodor (beluga) commented in the Kaggle discussion thread that “the semi-disqualification is fishy… I am very disappointed about the decision and it undermines trust from competitor point of view.”
The competition rules posted on Kaggle detail the dispute resolution system: “in any such dispute, under no circumstances will any Competition participant be permitted or entitled to obtain awards for, and hereby waives all rights to claim punitive, incidental or consequential damages, or any other damages, including attorneys’ fees, other than the individual participant’s actual out-of-pocket expenses (if any), not to exceed ten dollars ($10 USD), and each individual participant further waives all rights to have damages multiplied or increased.”
The All Faces Are Real team echoed Fodor’s sentiments: “Successful Kaggle competitions rely on a trust between competitors and Kaggle that the rules will be fairly explained and applied, and this trust has been damaged.”
Journalist: Fangyu Cai | Editor: Michael Sarazen