LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling

In less than two years since their introduction, vision transformers (ViT) have revolutionized the computer vision field, leveraging transformer architectures’ powerful self-attention mechanisms to eliminate the need for convolutions and advance the state-of-the-art on image classification tasks. More recently, approaches such as MLP-Mixer and carefully redesigned convolutional neural networks (CNNs) have achieved ViT-comparable performance, and machine learning researchers continue to seek optimal architectural designs for computer vision tasks.

In the new paper Sequencer: Deep LSTM for Image Classification, a research team from Rikkyo University and AnyTech Co., Ltd. examines the suitability of different inductive biases for computer vision and proposes Sequencer, an architectural alternative to ViTs that uses traditional long short-term memory (LSTM) rather than self-attention layers. Sequencer reduces memory cost by mixing spatial information with memory-economical and parameter-saving LSTM and achieves ViT-competitive performance on long sequence modelling.

The Sequencer architecture employs bidirectional LSTM (BiLSTM) as a building block and, inspired by Hou et al.’s 2021 Vision Permutator (ViP), processes the vertical and horizontal axes in parallel. The researchers introduce two BiLSTMs to enable parallel processing of the top/bottom and left/right directions, which improves Sequencer’s accuracy and efficiency due to reduced sequence length and yields a spatially meaningful receptive field.

Sequencer takes nonoverlapping patches as input and matches them to a feature map. The Sequencer block has two sub-components: 1) a BiLSTM layer that can mix spatial information memory economically and globally, and 2) a multi-layer perceptron (MLP) for channel-mixing. As with existing architectures, the output of the last block is sent to a linear classifier via the global average pooling layer.

In their empirical study, the team compared the proposed Sequencer with CNNs, ViTs, and MLP- and FFT-based model architectures with comparable numbers of parameters on the ImageNet-1K benchmark dataset; and tested its transfer learning capabilities. Sequencer achieved an impressive 84.6 percent top-1 accuracy in the evaluations, bettering ConvNeXt-S and Swin-S by 0.3 and 0.2 percent, respectively, and also demonstrated good transferability and robust resolution adaptability.

The team hopes their work will provide new insights on and improve understanding of the role of various inductive biases in computer vision and inspire further research on optimal architecture designs in this growing field.

The paper Sequencer: Deep LSTM for Image Classification is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

4 comments on “LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling”

Quordle

2023-03-14

I know this is one of the most important things I’ve learned. I’m excited as I read your article. But I should say a few things about the site in general: the style is perfect, and the articles are great.

Loading...

Pingback: LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling | June 2023 | Artificial Intelligence Journal
hurdle

2024-09-03

The money you saved me and the help you provided will be treasured forever. Your remarkable service has won you the title of esteemed spiritual leader, for which I am very grateful.

Loading...

2048 cupcakes

2024-10-13

Wow those are great ideas how did you get them 2048 cupcakes unbelievable

Loading...

LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling

Like this:

4 comments on “LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling”

Leave a Reply Cancel reply

Related

Share this:

Like this:

4 comments on “LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling”

Leave a Reply Cancel reply

Related