In the new paper SpectFormer: Frequency and Attention Is What You Need in a Vision Transformer, a research team from Microsoft and the University of Bath proposes Spectformer, a novel transformer architecture that combines spectral and multi-headed attention layers to better capture appropriate feature representations and improve performance.
In the new paper Transformers with Multiresolution Attention Heads (currently under double-blind review for ICLR 2023), researchers propose MrsFormer, a novel transformer architecture that uses Multiresolution-head Attention to approximate output sequences and significantly reduces head redundancy without sacrificing accuracy.
In the new paper Self-attention Does Not Need O(n2) Memory, a Google Research team presents novel and simple algorithms for attention and self-attention that require only constant memory and logarithmic memory and reduce the self-attention memory overhead by 59x for inference and by 32x for differentiation at a sequence length of 16384.
In the new paper Understanding How Encoder-Decoder Architectures Attend, researchers from the University of Washington, Google Blueshift Team and Google Brain Team propose a method for decomposing hidden states over a sequence into temporal- and input-driven components, revealing how attention matrices are formed in encoder-decoder networks.
A Google Research team draws inspiration from two numerical analysis methods — Hierarchical Matrix (H-Matrix) and Multigrid — to address the quadratic complexity problem of attention mechanisms in transformer architectures, proposing a hierarchical attention scheme that has linear complexity in run time and memory.