Tag: Visual Transformer

AI Computer Vision & Graphics Machine Learning & Data Science Research

Are Patches All You Need? New Study Proposes Patches Are Behind Vision Transformers’ Strong Performance

A research team proposes ConvMixer, an extremely simple model designed to support the argument that the impressive performance of vision transformers (ViTs) is mainly attributable to their use of patches as the input representation. The study shows that ConvMixer can outperform ViTs, MLP-Mixers and classical vision models.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google & Rutgers’ Aggregating Nested Transformers Yield Better Accuracy, Data Efficiency and Convergence

A research team from Google Cloud AI, Google Research and Rutgers University simplifies vision transformers’ complex design, proposing nested transformers (NesT) that simply stack basic transformer layers to process non-overlapping image blocks individually. The approach achieves superior ImageNet classification accuracy and improves model training efficiency.