BERT-Style Pretraining on Convnets? Peking U, ByteDance & Oxford U’s Sparse Masked Modelling With Hierarchy Leads the Way
In the new paper Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling, a research team from Peking University, ByteDance, and the University of Oxford presents Sparse Masked Modelling with Hierarchy (SparK), the first BERT-style pretraining approach that can be used on convolutional models without any backbone modifications.