Self-supervised methods in representation learning with residual networks (ResNets) have made strong progress in recent years, but still trail state-of-the-art supervised learning performance on ImageNet classification benchmarks. As the availability of unlabelled data increases and labelled data becomes more expensive and impractical to obtain, machine learning researchers believe it is crucial to advance unsupervised (or self-supervised) model training techniques.
In the paper Pushing the Limits of Self-supervised ResNets: Can We Outperform Supervised Learning Without Labels on ImageNet?, a DeepMind research team proposes ReLICv2, which advances the Representation Learning via Invariant Causal Mechanisms (ReLIC) framework with better point selection strategies and demonstrates for the first time that representations learned without labels can consistently outperform a strong, supervised baseline on ImageNet and even achieve results comparable to state-of-the-art self-supervised vision transformers (ViTs).
The team summarizes their main contributions as:
- We briefly review ReLIC and introduce ReLICv2, which incorporates our proposed improvements.
- We provide further insights and analysis into how ReLICv2 learns representations as well as its scaling capabilities and place our contributions in the wider context of recent developments in representation learning.
- We show that ReLICv2 performs comparably to the latest vision transformer architectures and argue that the concepts and results developed in this work could have important implications for wider adoption of self-supervised pre-training in a variety of domains as well as the design of objectives for foundational machine learning systems.
The ReLIC framework (Mitrovic et al., 2021) learns representations based on the principle of invariant prediction to explicitly enforce invariance over the relationship between similar and dissimilar points in a dataset via a term in the loss function. This design ensures that the learned representations can transfer well to downstream tasks.
The proposed RsLICv2 leverages the advantages of ReLIC with better strategies for selecting similar and dissimilar points and incorporates these into both the contrastive and invariance objectives. This design improvement enables ReLICv2 to advance state-of-the-art self-supervised learning performance on a wide range of ResNet architectures.
For their empirical study, the team pretrained representations without using labels on the ImageNet ILSVRC-2012 training set and examined ReLICv2 performance on the corresponding validation set. They also evaluated ReLICv2’s robustness and out-of-distribution (OOD) generalization ability.
The results show ReLICv2 is the first self-supervised representation learning method to outperform the supervised ResNet50 baseline on linear ImageNet evaluation across 1x, 2x and 4x variants. ReLICv2 achieved 77.1 percent top-1 classification accuracy on ImageNet with a ResNet50 architecture and 80.6 percent accuracy with larger ResNet models, surpassing state-of-the-art self-supervised approaches by a wide margin and exhibiting competitive performance in transfer learning, semi-supervised learning, and robustness and out-of-distribution generalization. The researchers also note that although ReLICv2 uses traditional ResNet architectures, it can achieve performance comparable with the latest ViT-based methods.
The team says theirs is the first study to show that representations learned without labels can consistently outperform a strong, supervised baseline on ImageNet, and believes the proposed ReLICv2 can lead to further improvements in representation learning and more powerful foundation models.
The paper Pushing the Limits of Self-supervised ResNets: Can We Outperform Supervised Learning Without Labels on ImageNet? is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.