Geoffrey Hinton is once again in the AI spotlight, this time with new research that achieves a tremendous performance leap in image recognition using unsupervised learning. The AI pioneer and Turing Award honouree also made a rare appearance on Twitter promoting the research, “Unsupervised learning of representations is beginning to work quite well without requiring reconstruction.”
Hinton’s comment regarding data types and model training echoes his speech at last week’s AAAI 2020 Conference in New York. Introducing his most recent work on Stacked Capsule Auto-encoders, Hinton quipped “I always knew unsupervised learning was the right thing to do.”
Appearing on the same AAAI stage, fellow Turing Award winner Yann LeCun agreed that unsupervised learning may be a game-changer for AI moving forward: “We read a lot about the limitations of deep learning today, but most of those are actually limitations of supervised learning… This is an argument that Geoff [Hinton] has been making for decades. I was skeptical for a long time but changed my mind.” Unsupervised learning, which LeCun prefers to call “self-supervised learning” and which overlaps with the term “semi-supervised learning,” generally refers to model training that does not require manual data labelling.
In the paper A Simple Framework for Contrastive Learning of Visual Representations, a team of Google Brain researchers including Hinton propose a simple but powerful “SimCLR” framework for contrastive learning of visual representations. The team concludes “A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over the previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100x fewer labels.”
The impressive experiment results have made the paper topic a hot topic across the machine learning community.
How to learn visual representation effectively without human supervision has been a longstanding problem for AI researchers, with generative approaches and discriminative approaches the two main existing methods. As discriminative approaches based on contrastive learning in the latent space have recently shown promising results, this is where the team applied their efforts.
Contrastive visual representation learning was first introduced to learn representation by contrasting positive pairs against negative pairs. While previous researches used a memory bank to store the instance class representation vector, the team developed simplified contrastive self-supervised learning algorithms that do not require specialized architectures or a memory bank. The SimCLR framework learns representations by maximizing agreement between differently augmented views of the same data example through a contrastive loss in the latent space.
The team observed that data augmentation has played an important part in yielding effective representations, and believe that conducting multiple data augmentation operations — random cropping, colour distortion, Gaussian blur, etc. — is crucial in defining the contrastive prediction tasks that yield effective representations. And compared to supervised learning, unsupervised contrastive learning shows greater benefits from stronger data augmentation.
Currently, supervised learning and unsupervised learning are the two main machine learning methods. Traditional supervised learning however requires labelled data for algorithm training, and correctly labelled datasets are not always accessible. Unsupervised learning thus represents something of an ideal solution, as it allows researchers to feed unlabelled data directly to a deep learning model, which then attempts to extract features and patterns and essentially make sense of it. Semi-supervised learning meanwhile uses training datasets comprising both labelled and unlabelled data (usually much more of the latter than the former). This method performs particularly well when labelling the data is prohibitively time-consuming, and extracting relevant features from the data is difficult — for example with medical images such as CT scans and MRIs.
The main machine learning methods are examined closely in the new Google Brain study, with researchers proposing that some previous methods for unsupervised or self-supervised learning may have been unnecessarily complicated. The researchers say the strength and performance of their new simple framework suggest that “despite a recent surge in interest, self-supervised learning remains undervalued.”
The paper A Simple Framework for Contrastive Learning of Visual Representations is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen