In the new paper Toward a Realistic Model of Speech Processing in the Brain with Self-supervised Learning, researchers show that self-supervised architectures such as Wav2Vec 2.0 can learn brain-like representations from as little as 600 hours of unlabelled speech; and can also learn sound-generic and speech- and language-specific representations similar to those of the prefrontal and temporal cortices.
In the new paper Masked Autoencoders As Spatiotemporal Learners, a Meta AI research team extends masked autoencoders (MAE) to spatiotemporal representation learning for video. The novel approach introduces negligible inductive biases on space-time while achieving strong empirical results compared to vision transformers (ViTs) and outperforms supervised pretraining by large margins.
A DeepMind research team proposes ReLICv2, which demonstrates for the first time that representations learned without labels can consistently outperform a strong, supervised baseline on ImageNet and even achieve comparable results to state-of-the-art self-supervised vision transformers (ViTs).
In the new paper Masked Feature Prediction for Self-Supervised Visual Pre-Training, a Facebook AI Research and Johns Hopkins University team presents a novel Masked Feature Prediction (MaskFeat) approach for the self-supervised pretraining of video models that achieves SOTA results on video benchmarks.
In the new paper Understanding the World Through Action, UC Berkeley assistant professor in the department of electrical engineering and computer sciences Sergey Levine argues that a general, principled, and powerful framework for utilizing unlabelled data can be derived from reinforcement learning to enable machine learning systems leveraging large datasets to understand the real world.
An Apple research team performs a comparative analysis on a contrastive self-supervised learning (SSL) algorithm (SimCLR) and a supervised learning (SL) approach for simple image data in a common architecture, shedding light on the similarities and dissimilarities in their learned visual representation patterns.
A research team from IBM introduces two systems for predicting information type: The TypeSuggest module, an unsupervised system designed to generate types for a set of seed query terms input by the user; and an Answer Type prediction module for predicting the correct answer type for user-provided questions.
Yann LeCun and a team of researchers propose Barlow Twins, a method that learns self-supervised representations through a joint embedding of distorted images, with an objective function that can make the embedding vectors almost identical while reducing redundancy between their components.
“AI is the best bot to keep people safe on our platforms,” Facebook Director of Artificial Intelligence Manohar Paluri told the F8 audience, adding that an effective way to achieve that goal is enabling Facebook’s AI system to “understand content and work effectively with less labeled training data.”