Attention Mechanisms

by Synced 2024-04-03 2

Huawei & Peking U’s DiJiang: A Transformer Achieving LLaMA2-7B Performance at 1/50th the Training Cost

A research team from Huawei and Peking University introduces DiJiang, a groundbreaking Frequency Domain Kernelization approach, which facilitates the transition to a linear complexity model with minimal training overhead, achieving performance akin to LLaMA2-7B across various benchmarks, but at just 1/50th of the training cost.

by Synced 2023-10-11 1

AI Machine Learning & Data Science Research

Yale U & Google’s HyperAttention: Long-Context Attention with the Best Possible Near-Linear Time Guarantee

In a new paper HyperAttention: Long-context Attention in Near-Linear Time, a research team from Yale University and Google Research presents HyperAttention, an approximate attention mechanism not only offers practical efficiency but also delivers the best near-linear time guarantee for long contexts processing.

by Synced 2023-04-18 6

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft & Bath U’s SpectFormer Significantly Improves Vision Transformers via Frequency and Attention

In the new paper SpectFormer: Frequency and Attention Is What You Need in a Vision Transformer, a research team from Microsoft and the University of Bath proposes Spectformer, a novel transformer architecture that combines spectral and multi-headed attention layers to better capture appropriate feature representations and improve performance.

by Synced 2022-11-14 7

AI Machine Learning & Data Science Research

‘MrsFormer’ Employs a Novel Multiresolution-Head Attention Mechanism to Cut Transformers’ Compute and Memory Costs

In the new paper Transformers with Multiresolution Attention Heads (currently under double-blind review for ICLR 2023), researchers propose MrsFormer, a novel transformer architecture that uses Multiresolution-head Attention to approximate output sequences and significantly reduces head redundancy without sacrificing accuracy.

by Synced 2022-03-01 1

AI Machine Learning & Data Science Research

Cornell U & Google Brain’s FLASH Yields High Transformer Quality in Linear Time

A research team from Cornell University and Google Brain introduces FLASH, a model family that achieves quality on par with fully augmented transformers while maintaining linear scalability over the context size on modern accelerators.

by Synced 2021-12-14 2

AI Machine Learning & Data Science Research

Google Proposes a ‘Simple Trick’ for Dramatically Reducing Transformers’ (Self-)Attention Memory Requirements

In the new paper Self-attention Does Not Need O(n2) Memory, a Google Research team presents novel and simple algorithms for attention and self-attention that require only constant memory and logarithmic memory and reduce the self-attention memory overhead by 59x for inference and by 32x for differentiation at a sequence length of 16384.

by Synced 2021-11-04 2

AI Machine Learning & Data Science Research

Washington U & Google Study Reveals How Attention Matrices Are Formed in Encoder-Decoder Architectures

In the new paper Understanding How Encoder-Decoder Architectures Attend, researchers from the University of Washington, Google Blueshift Team and Google Brain Team propose a method for decomposing hidden states over a sequence into temporal- and input-driven components, revealing how attention matrices are formed in encoder-decoder networks.

by Synced 2021-08-30 4

AI Machine Learning & Data Science Popular Research

Tsinghua U & Microsoft Propose Fastformer: An Additive Attention Based Transformer With Linear Complexity

A team from Tsinghua University and Microsoft Research Asia proposes Fastformer, an efficient Transformer variant based on additive attention that achieves effective context modelling with linear complexity.

by Synced 2021-08-05 3

AI Machine Learning & Data Science Nature Language Tech Research

Google’s H-Transformer-1D: Fast One-Dimensional Hierarchical Attention With Linear Complexity for Long Sequence Processing

A Google Research team draws inspiration from two numerical analysis methods — Hierarchical Matrix (H-Matrix) and Multigrid — to address the quadratic complexity problem of attention mechanisms in transformer architectures, proposing a hierarchical attention scheme that has linear complexity in run time and memory.

by Synced 2020-08-22 2

Machine Learning & Data Science Nature Language Tech Popular

Sepp Hochreiter on Parallels Between Attention Mechanisms and Modern Hopfield Networks

Researchers argue that “attention mechanism is the update rule of a modern Hopfield network with continuous states.”