Tag: vision transformer

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft & Bath U’s SpectFormer Significantly Improves Vision Transformers via Frequency and Attention

In the new paper SpectFormer: Frequency and Attention Is What You Need in a Vision Transformer, a research team from Microsoft and the University of Bath proposes Spectformer, a novel transformer architecture that combines spectral and multi-headed attention layers to better capture appropriate feature representations and improve performance.

AI Computer Vision & Graphics Machine Learning & Data Science Research

NVIDIA’s Global Context ViT Achieves SOTA Performance on CV Tasks Without Expensive Computation

In the new paper Global Context Vision Transformers, an NVIDIA research team proposes the Global Context Vision Transformer, a novel yet simple hierarchical ViT architecture comprising global self-attention and token generation modules that enables the efficient modelling of both short- and long-range dependencies without costly compute operations while achieving SOTA results across various computer vision tasks.

AI Machine Learning & Data Science Research

Is BERT the Future of Image Pretraining? ByteDance Team’s BERT-like Pretrained Vision Transformer iBOT Achieves New SOTAs

A research team from ByteDance, Johns Hopkins University, Shanghai Jiao Tong University and UC Santa Cruz seeks to apply the proven technique of masked language modelling to the training of better vision transformers, presenting iBOT (image BERT pretraining with Online Tokenizer), a self-supervised framework that performs masked prediction with an online tokenizer.

AI Machine Learning & Data Science Research

Can ViT Layers Express Convolutions? Peking U, UCLA & Microsoft Researchers Say ‘Yes’

In the new paper Can Vision Transformers Perform Convolution?, a research team from Peking University, UCLA and Microsoft Research proves that a single ViT layer with image patches as the input can perform any convolution operation constructively, and show that ViT performance in low data regimes can be significantly improved using their proposed ViT training pipeline.