AI Machine Learning & Data Science Research

Alibaba’s USI: A Unified Scheme for Training Any Backbone on ImageNet That Delivers Top Results Without Hyperparameter Tuning

In the new paper Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results, a research team from Alibaba Group’s DAMO Academy introduces USI (Unified Scheme for ImageNet), a unified scheme for training any backbone on ImageNet that does not require adjustments or hyperparameter tuning between different models, and consistently yields top model results in terms of accuracy and efficiency.

Ten years ago, a Geoffrey Hinton team jump-started modern AI research with their AlexNet CNN’s outstanding performance on the ImageNet Large Scale Visual Recognition Challenge. While the ImageNet dataset remains the primary benchmark for milestones in model architectures at the intersection of computer vision and deep learning, training on ImageNet can be challenging and time-consuming, as expert knowledge is typically required to design and finetune a dedicated training scheme for each newly proposed architecture.

In the new paper Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results, a research team from Alibaba Group’s DAMO Academy introduces USI (Unified Scheme for ImageNet), a unified scheme that “transforms training on ImageNet from an expert-oriented task to an automatic procedure.” The proposed USI can be used for training any backbone on ImageNet, does not require adjustments or hyperparameter tuning between different models, and consistently yields top model results in terms of accuracy and efficiency.

The team summarizes their main contributions as:

  1. We introduce a unified, efficient training scheme for the ImageNet dataset, USI, that does not require hyperparameter tuning. Exactly the same recipe is applied to any backbone. Hence, ImageNet training is transformed from an expert-oriented task into an automatic seamless procedure.
  2. We test USI on various deep learning models, including ResNet-like, Mobile-oriented, Transformer-based and MLP-only models. We show it consistently and reliably achieves state-of-the-art results compared to tailor-made schemes per model.
  3. We use USI to perform a methodological speed-accuracy comparison of modern deep learning models, and identify efficient backbones along the Pareto curve.

The team shows how their proposed USI schema’s ability to train any backbone to state-of-the-art results without hyperparameter tuning or model-tailored methods is based on knowledge distillation and some “modern tricks.”

Knowledge distillation (KD) is a process that uses a well-performing large-scale teacher model to train a smaller target student model, effectively transferring knowledge from the large complex model to a simple and smaller model that can then be deployed on less powerful hardware such as mobile devices with only minimal performance reductions.

The proposed USI leverages KD for classification, which produces a number of positive impacts when training deep neural networks on ImageNet: 1) The teacher model’s predictions contain richer useful information than the plain (single-label) ground truth; 2) The method can deal with better pictures with multiple objects ; 3) KD predictions better handle strong augmentations; and 4) There is no longer a need for label smoothing. As such, applying KD to ImageNet leads to a more robust and effective optimization process.

The team employs a number of modern tricks to enable USI to train any backbone to top results. For example, as the maximal batch size of different backbones varies, they recommend using a range (0.8 to 0.9) of a model’s maximal possible batch size for optimizing training speeds. Also, as their approach is robust to different teacher and student types, they suggest choosing a teacher model with a favourable speed-accuracy trade-off.

In their evaluations, the team applied USI to various deep learning architectures and compared its ImageNet top-1 accuracy with previous state-of-the-art schema.

The experimental results show that: “

  1. For CNN architectures, USI significantly outperforms previous results.
  2. For transformer architectures, on two prominent models, ViT-S and LeViT-384, USI reaches better results compared to the DeiT scheme.
  3. For mobile-oriented and MLP-based architectures, USI shows significant improvements compared to previously reported results. ”

The team also shows that the proposed USI schema achieves better speed-accuracy trade-offs, is more robust, and enables performing methodical speed-accuracy comparisons to reliably identify efficient computer vision backbones.

The implementation is available on the project’s GitHub. The paper Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Alibaba’s USI: A Unified Scheme for Training Any Backbone on ImageNet That Delivers Top Results Without Hyperparameter Tuning

Leave a Reply

Your email address will not be published.

%d bloggers like this: