Researchers from Facebook and the French National Institute for Research in Digital Science and Technology (Inria) have developed a new technique for self-supervised training of convolutional networks used for image classification and other computer vision tasks. The proposed method surpasses supervised techniques on most transfer tasks and outperforms previous self-supervised approaches.
“Our approach allows researchers to train efficient, high-performance image classification models with no annotations or metadata,” the researchers write in a Facebook blog post. “More broadly, we believe that self-supervised learning is key to building more flexible and useful AI.”
Recent improvements in self-supervised training methods have established them as a serious alternative to traditional supervised training. Self-supervised approaches however are significantly slower to train compared to their supervised counterparts. The new method leverages contrastive learning in a more effective and efficient manner, the researchers explain.
Contrastive learning is a powerful method for learning visual features without supervision. Contrastive methods generally train convolutional networks by discriminating between images instead of predicting a label associated with an image.
The researchers’ approach takes advantage of contrastive methods without requiring an explicit comparison between every image pair. Their “SwAV” (Swapping Assignments between Views) algorithm simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations of the same image.
The proposed method first computes features of the cropped sections of two images and assigns each of them to a cluster of images, then constrains the two cluster assignments to match over time. Using a “swapped” prediction mechanism to predict the cluster assignment of a view from the representation of another view, the system eventually discovers that all images of a certain subject represent the same information.
The researchers also introduce a new data augmentation strategy for self-supervised learning, multi-crop, which allows them to greatly increase the number of image comparisons made during training without having much of an impact on the memory or compute requirements.
Evaluated on several standard self-supervised benchmarks, the new method reached 75.3 percent top-1 accuracy on ImageNet with ResNet-50 and 78.5 percent with a wider model. The multi-crop strategy also improved the performance of self-supervised methods such as SimCLR and two clustering-based models — DeepCluster and SeLa.
The researchers note that compared with previous self-supervised methods, their method can train models to achieve high performance much more quickly. For instance, it requires only six hours and 15 minutes to achieve 72.1 percent top-1 accuracy with a standard ResNet-50 on ImageNet — outperforming the self-supervised method SimCLR trained for 40 hours.
The paper Unsupervised Learning of Visual Features by Contrasting Cluster Assignments is on arXiv. The SwAV code and pretrained models are available on the project GitHub.
Journalist: Yuan Yuan | Editor: Michael Sarazen
This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.