Category: Computer Vision & Graphics

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Brain Uncovers Representation Structure Differences Between CNNs and Vision Transformers

A Google Brain research team explores the internal representation structures of ViTs and CNNs on image classification tasks, providing insights on key differences between the two approaches.

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Facebook & UC Berkeley Substitute a Convolutional Stem to Dramatically Boost Vision Transformers’ Optimization Stability

A research team from Facebook AI and UC Berkeley finds a solution for vision transformers’ optimization instability problem by simply using a standard, lightweight convolutional stem for ViT models. The approach dramatically increases optimizer stability and improves peak performance without sacrificing computation efficiency.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Video Swin Transformer Improves Speed-Accuracy Trade-offs, Achieves SOTA Results on Video Recognition Benchmarks

A research team from Microsoft Research Asia, University of Science and Technology of China, Huazhong University of Science and Technology, and Tsinghua University takes advantage of the inherent spatiotemporal locality of videos to present a pure-transformer backbone architecture for video recognition that leads to a better speed-accuracy trade-off.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google & Rutgers’ Aggregating Nested Transformers Yield Better Accuracy, Data Efficiency and Convergence

A research team from Google Cloud AI, Google Research and Rutgers University simplifies vision transformers’ complex design, proposing nested transformers (NesT) that simply stack basic transformer layers to process non-overlapping image blocks individually. The approach achieves superior ImageNet classification accuracy and improves model training efficiency.