A research team from Google Cloud AI, Google Research and Rutgers University simplifies vision transformers’ complex design, proposing nested transformers (NesT) that simply stack basic transformer layers to process non-overlapping image blocks individually. The approach achieves superior ImageNet classification accuracy and improves model training efficiency.
Artificial general intelligence (AGI) is the long-range, human-intelligence-level target of contemporary AI researchers worldwide. It’s believed AGI has the potential to meet basic human needs globally, end poverty, cure diseases, extend life, and even mitigate climate change. In short, AGI is the tech that could not only save the world, but build a utopia.
Since 2010, the annual ImageNet Large-Scale Visual Recognition Challenge has been the most widely recognized benchmark for testing image recognition algorithms. Tencent Machine Learning picks up the challenge with its new paper Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes.
The ShuffleNet utilizes pointwise group convolution and channel shuffle to reduce computation cost while maintaining accuracy. It manages to obtain lower top-1 error than the MobileNet system on ImageNet classification, and achieves ~13x actual speedup over AlexNet while maintaining comparable accuracy.