Self-supervised learning (SSL) algorithms for solving visual tasks have been closing the performance gap with traditional supervised learning (SL) approaches in recent years. A question naturally arises: Do the two methods’ learned representations differ, and if so, how?
An Apple research team tackles this question in the new paper Do Self-Supervised and Supervised Methods Learn Similar Visual Representations? The study compares a contrastive SSL algorithm (SimCLR) to SL for simple image data on a common model architecture, shedding light on the similarities and dissimilarities of their respective learned visual representation patterns.
SimCLR is a simple framework for contrastive learning of visual representations that learns its representations by maximizing the agreement between differently augmented views of the same data via a contrastive loss.
To compare neural representation spaces, the team leveraged Centered Kernel Alignment (CKA) as a similarity index to address the neural representations’ distributed nature, potential misalignment, and high dimensionality issues.
The team employed a ResNet-50 (R50) backbone for each model in their evaluation experiments. They first used CKA to study the internal representational similarities of network layers on R50s trained via SimCLR; then compared the representational structures induced by SimCLR and SL, plotting the odd and even layer CKA matrices across the learning methods.
The researchers also investigated what happens in the layers of both networks with respect to SimCLR’s augmentation invariance objective. By plotting the CKA value between the representations, they were able to observe the degree of invariance at each layer. They then plotted the CKA similarity of class representations and learned representations across different layers of the SimCLR and SL networks.
The team summarises their main findings as:
- Post-residual representations are similar across methods, however residual (block-interior) representations are dissimilar; similar structure is recovered by solving different problems.
- Initial residual layer representations are similar, indicating a shared set of primitives.
- The methods strongly t to their distinct objectives in the final few layers, where SimCLR learns augmentation invariance and SL fits the class structure.
- SL does not implicitly learn augmentation invariance, but augmentation invariance does implicitly fit the class structure and induces linear separability.
- The representational structures rapidly diverge in the final layers, suggesting that SimCLR’s performance stems from class-informative intermediate representations, rather than implicit structural agreement between learned solutions to the SL and SimCLR objectives.
Overall, the study demonstrates CKA’s ability to compare across learning methods and reveals that it is largely the similarity of the intermediate representations — rather than the similarity of final representational structures — that solves SimCLR’s objective and enables its impressive performance. The team believes the work highlights the relative importance of learned intermediate representations and can provide useful insights for future research on and approaches to auxiliary task design.
The paper Do Self-Supervised and Supervised Methods Learn Similar Visual Representations? is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.