Kaiming He’s MetaAI Team Proposes ViTDet: A Plain Vision Transformer Backbone Competitive With Hierarchical Backbones on Object Detection
A Meta AI research team explores the plain, non-hierarchical vision transformer (ViT) as a backbone network for object detection, proposing a ViT Detector that achieves performance competitive with traditional hierarchical backbones.