A group of Google researchers led by Quoc Le — the AI expert behind Google Neural Machine Translation and AutoML — have published a paper proposing attention augmentation. In experiment results, the novel two-dimensional relative self-attention mechanismfor image classification delivers “consistent improvements in image classification.”
In 2014 MILA PhD student Dzmitry Bahdanau and other researchers including Yoshua Bengio proposed combining attention as a computational module with a Recurrent Neural Network (RNN) for alignment in machine translation. The extension of attention studies later enabled self-attentional Transformer architecture to achieve state-of-the-art results in machine translation. The module has now become a standard application due to its ability to capture long-sequence interactions.
While convolutional neural networks (CNN) have been widely deployed in many computer vision applications, their operation is limited to a local neighborhood and as such they lack global information.
To achieve better performance for image classification, researchers chose to combine convolutions with self-attention. They proposed augmenting convolutional operators with a self-attention mechanism “by concatenating convolutional feature maps with a set of feature maps produced via self-attention.” To make this two-dimensional relative self-attention mechanism suitable for images, they maintained translation equivariance while infusing the mechanism with relative position information. Researchers obtained the best results when combining self-attention and convolutions.
After extensive experiments, researchers managed to keep the number of parameters similar, and concluded attention augmentation achieved consistent improvements in image classification on ImageNet, achieving a 1.3 percent top-1 accuracy improvement on classification over a ResNet50 baseline and outperforming other attention mechanisms for images such as Squeeze-and-Excitation. The method also improved COCO Object Detection on a RetinaNet baseline by 1.4 mean Average Precision.
Researchers suggest the creation of attention augmentation may inspire future studies in automated architecture search procedures to find better models for image classification, object detection, image segmentation, and other tasks.
The paper Attention Augmented Convolutional Networks is on ArXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen