A team of researchers from South Korea’s Naver AI Lab says they’ve found a computationally efficient re-labelling strategy that fixes a significant flaw in ImageNet.
Arguably the machine learning community’s most popular image classification benchmark, ImageNet contains more than 14 million labelled images and has helped improve the performances of countless image recognition models. Since Stanford professor and renowned AI researcher Fei-Fei Li introduced the database in 2006, it has also become a touchstone for evaluating computer vision models’ applicability on downstream vision tasks. A recent UC Berkeley and Google paper sums it up: “Methods live or die by their ‘performance’ on this benchmark, measured by how frequently images are assigned the correct label out of 1,000 possible classes.“
ImageNet’s popularity however doesn’t mean it’s perfect. The de-facto benchmark for the image classifiers also contains a significant level of label noise, and although many ImageNet samples contain multiple object classes, often only one of the present categories has been labelled.
The ultimate goal is to equip ImageNet training images with a full set of classes (multi-labels) along with localized labels that indicate where each object is located. Unlike previous approaches that expanded the ImageNet validation set labels into multi-labels, the Naver AI Lab researchers focused on developing a strategy for the ImageNet training labels.
In the paper Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels, the Naver Lab researchers propose “ReLabel,” a novel re-labelling approach designed to transform the single-class labels of 1.28 million training images on ImageNet into multi-class labels assigned to image regions.
To generate those new multi-class ground truth labels without incurring the formidable costs associated with human annotators, the researchers pretrained a machine annotator on “a super-ImageNet scale” (the JFT-300M dataset with 300 million images and 375 million labels or InstagramNet-1B with about 1 billion Instagram images), then fine-tuned it on ImageNet to predict the database’s classes.
The ReLabel strategy enables the machine annotator to generate location-wise multi-labels, and the novel LabelPooling framework uses these localized multi-labels to train the image classifier by passing predictions through the final pooling layer to specify additional location-specific supervision signals.
The researchers pretrained the SOTA classifiers EfficientNet- (B1,B3,B5,B7,B8), EfficientNet-L2 trained with JFT-300M , and ResNeXT-101 32x(32d,48d) trained with InstagramNet-1B. The researchers chose EfficientNet-L2 as their machine annotator, as it boosted the top-1 classification accuracy of ResNet-50 to 78.9 percent on ImageNet with the transformed and localized multi-labels, for a 1.4 percent accuracy gain compared to the baseline model trained with the original ImageNet labels.
More importantly, although the machine annotator was pretrained on a “super-ImageNet scale,” the proposed ReLabel approach combined with the LabelPooling framework drastically decreased ResNet-50 training time from 328 GPU hours to only 10 GPU hours.
The paper Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels is on arXiv. The re-labelled ImageNet training set, pretrained weights and source code are available on the project GitHub.
Reporter: Fangyu Cai | Editor: Michael Sarazen