A team from Facebook AI Research and UC Berkeley proposes ConvNeXts, a pure ConvNet model that achieves performance comparable with state-of-the-art hierarchical vision transformers on computer vision benchmarks while retaining the simplicity and efficiency of standard ConvNets.
In the new paper Can Vision Transformers Perform Convolution?, a research team from Peking University, UCLA and Microsoft Research proves that a single ViT layer with image patches as the input can perform any convolution operation constructively, and show that ViT performance in low data regimes can be significantly improved using their proposed ViT training pipeline.
Might there be a more efficient approach to scaling up CNNs to improve accuracy? Researchers from Google AI say “yes” and have proposed a new model scaling method in their ICML 2019 paper EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.
Deep Learning has become an essential toolbox which is used in a wide variety of applications, research labs, industries, etc. In this tutorial given at NIPS 2017, the speakers provide a set of guidelines which will help newcomers to the field understand the most recent and advanced models and their application to diverse data modalities.
The ShuffleNet utilizes pointwise group convolution and channel shuffle to reduce computation cost while maintaining accuracy. It manages to obtain lower top-1 error than the MobileNet system on ImageNet classification, and achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.