A research team from University of California San Diego and Microsoft proposes Micro-Factorized Convolution (MF-Conv), a novel approach that can deal with extremely low computational costs (4M–21M FLOPs) and achieves significant performance gains over state of the art models in the low FLOP regime.
A research team from Taichi Graphics, MIT CSAIL, Zhejiang University, Tsinghua University and Kuaishou Technology introduces a programming language and compiler for quantized simulation that achieves both high performance and significantly reduced memory costs by enabling flexible and aggressive quantization.
A research team from Facebook AI and UC Berkeley finds a solution for vision transformers’ optimization instability problem by simply using a standard, lightweight convolutional stem for ViT models. The approach dramatically increases optimizer stability and improves peak performance without sacrificing computation efficiency.
A research team from Microsoft Research Asia, University of Science and Technology of China, Huazhong University of Science and Technology, and Tsinghua University takes advantage of the inherent spatiotemporal locality of videos to present a pure-transformer backbone architecture for video recognition that leads to a better speed-accuracy trade-off.
A research team from Google Cloud AI, Google Research and Rutgers University simplifies vision transformers’ complex design, proposing nested transformers (NesT) that simply stack basic transformer layers to process non-overlapping image blocks individually. The approach achieves superior ImageNet classification accuracy and improves model training efficiency.
Yann LeCun and a team of researchers propose Barlow Twins, a method that learns self-supervised representations through a joint embedding of distorted images, with an objective function that can make the embedding vectors almost identical while reducing redundancy between their components.