Tag: vision transformer

AI Machine Learning & Data Science Research

Kaiming He’s MetaAI Team Proposes ViTDet: A Plain Vision Transformer Backbone Competitive With Hierarchical Backbones on Object Detection

A Meta AI research team explores the plain, non-hierarchical vision transformer (ViT) as a backbone network for object detection, proposing a ViT Detector that achieves performance competitive with traditional hierarchical backbones.

AI Machine Learning & Data Science Research

Microsoft’s FocalNets Replace ViTs’ Self-Attention With Focal Modulation to Improve Visual Modelling

A Microsoft Research team proposes FocalNet (Focal Modulation Network), a simple and attention-free architecture designed to replace transformers’ self-attention module. FocalNets exhibit significant superiority over self-attention for effective and efficient visual modelling in real-world applications.

AI Machine Learning & Data Science Research

Is BERT the Future of Image Pretraining? ByteDance Team’s BERT-like Pretrained Vision Transformer iBOT Achieves New SOTAs

A research team from ByteDance, Johns Hopkins University, Shanghai Jiao Tong University and UC Santa Cruz seeks to apply the proven technique of masked language modelling to the training of better vision transformers, presenting iBOT (image BERT pretraining with Online Tokenizer), a self-supervised framework that performs masked prediction with an online tokenizer.

AI Machine Learning & Data Science Research

Can ViT Layers Express Convolutions? Peking U, UCLA & Microsoft Researchers Say ‘Yes’

In the new paper Can Vision Transformers Perform Convolution?, a research team from Peking University, UCLA and Microsoft Research proves that a single ViT layer with image patches as the input can perform any convolution operation constructively, and show that ViT performance in low data regimes can be significantly improved using their proposed ViT training pipeline.