EPFL’s Multi-modal Multi-task Masked Autoencoder: A Simple, Flexible and Effective ViT Pretraining Strategy Applicable to Any RGB Dataset

The Swiss Federal Institute of Technology Lausanne (EPFL) presents Multi-modal Multi-task Masked Autoencoders (MultiMAE), a simple and effective pretraining strategy that enables masked autoencoding to include multiple modalities and tasks and is applicable to any RGB dataset.

Facebook AI & JHU’s MaskFeat Method Surpasses Kaiming He’s MAE, Sets New SOTA in Video Action Recognition

In the new paper Masked Feature Prediction for Self-Supervised Visual Pre-Training, a Facebook AI Research and Johns Hopkins University team presents a novel Masked Feature Prediction (MaskFeat) approach for the self-supervised pretraining of video models that achieves SOTA results on video benchmarks.