In recent years, state-of-the-art algorithms for contrastive learning (SimCLR) and latent bootstrapping (BYOL) have achieved huge leaps in the learning of visual representations without human supervision.
In the new paper Compressive Visual Representations, a Google Research team proposes that the addition of explicit information compression to these methods can significantly improve performance. The researchers modify SimCLR and BYOL by explicitly adding information compression using the Conditional Entropy Bottleneck (CEB), producing C-SimCLR and C-BYOL variants that demonstrate improved accuracy and robustness in their visual representations.
The team summarises their study’s contributions as:
- Reformulations of SimCLR and BYOL such that they are compatible with information-theoretic compression using the Conditional Entropy Bottleneck.
- An exploration of the relationship between Lipschitz continuity, SimCLR, and CEB compression, as well as a simple, tractable lower bound on the Lipschitz constant. This provides an alternative explanation, in addition to the information-theoretic view, for why CEB compression improves SimCLR model robustness.
- Extensive experiments supporting our hypothesis that adding compression to the state-of-the-art self-supervised representation methods like SimCLR and BYOL can significantly improve their performance and robustness to domain shifts across multiple datasets. In particular, linear evaluation accuracies of C-BYOL are even competitive with the supervised baselines considered by SimCLR and BYOL. C-BYOL reaches 76.0% and 78.8% with ResNet-50 and ResNet-50 2x respectively, whereas the corresponding supervised baselines are 76.5% and 77.8% respectively.
The team uses CEB to measure and control the amount of compression in their visual representations, enabling them to evaluate whether and to what extent compression improves visual representation quality.
The compressed SimCLR (C-SimCLR) model learns a compressed representation of a view that only preserves information relevant to predicting a different view by switching to CEB; while the compressed BYOL (C-BYOL) learns an online encoder that takes an augmented view of a given image as input and predicts a different augmented view of the same image as a target encoder.
The team conducted extensive experiments to evaluate their proposed approach. They first tested the representations learned by their models by training a linear classifier on the ImageNet training set. In the experiments, the reproduction of the SimCLR baseline (70.7 percent top-1 accuracy) outperformed that of the original paper (69.3 percent), while the BYOL implementation achieved performance ( 74.2 percent mean top-1 accuracy) that was comparable to the original paper.
The team also compared C-SimCLR and C-BYOL to other recent self-supervised methods (SimCLR, Barlow Twins, etc.), with C-BYOL achieving the highest accuracy compared to state-of-the-art methods. The team was able to further improve C-BYOL with ResNet-50, reaching 76.0 percent Top-1 accuracy.
Furthermore, in a model robustness experiment, SimCLR and BYOL models trained with CEB compression consistently outperformed their uncompressed counterparts across all robustness benchmarks.
Overall, the study shows that the proposed C-SimCLR and C-BYOL yield consistent improvements in both accuracy and robustness to domain shifts, confirming information compression of self-supervised representations as an effective and promising method for improving visual representations.
The paper Compressive Visual Representations is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.