Tag: Transformers

AI Machine Learning & Data Science Research

Google Leverages Transformers to Vastly Simplify Neural Video Compression With SOTA Results

In the new paper VCT: A Video Compression Transformer, a Google Research team presents an elegantly simple but powerful video compression transformer (VCT) that does not require architectural biases and priors and learns totally from data without any hand-crafting. VCT is easy to implement and outperforms conventional video compression approaches.

AI Machine Learning & Data Science Research

Microsoft’s XTC Extreme Lightweight Compression Method for Pretrained Transformers Achieves SOTA Results and 50x Smaller Model Sizes

In the new paper Extreme Compression for Pre-trained Transformers Made Simple and Efficient, a Microsoft research team introduces XTC, a simple yet effective extreme compression pipeline for pretrained transformers that can achieve state-of-the-art results while reducing model size by 50x.

AI Machine Learning & Data Science Research

Tsinghua U & BAAI’s CogView2 Achieves SOTA Competitive Text-to-Image Generation With 10x Speedups

In the new paper CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers, Tsinghua University and the Beijing Academy of Artificial Intelligence researchers pretrain a Cross-Modal general Language Model (CogLM) for text and image token prediction and finetune it for fast super-resolution. The resulting CogView2 hierarchical text-to-image system achieves significant speedups while generating images with better quality at comparable resolutions.

AI Machine Learning & Data Science Nature Language Tech Research

Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25%

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.

AI Machine Learning & Data Science Research

Google Extends Transformers for Immediate Knowledge Acquisition via a Simple New Data Read & Memorize Technique

A Google research team addresses conventional transformers’ resource-heavy training and fine-tuning requirements for learning new knowledge, proposing Memorizing Transformers as a step toward language models that can simply read and memorize new data at inference time for immediate knowledge acquisition.

AI Machine Learning & Data Science Nature Language Tech Research

Google & IDSIA’s Block-Recurrent Transformer Dramatically Outperforms Transformers Over Very Long Sequences

A team from Google Research and the Swiss AI Lab IDSIA proposes the Block-Recurrent Transformer, a novel long-sequence processing approach that has the same computation time and parameter count costs as a conventional transformer layer but achieves significant perplexity improvements in language modelling tasks over very long sequences.

AI Machine Learning & Data Science Research

Transformers Meet Online RL: New Study Unifies Offline Pretraining and Online Finetuning, Achieves SOTA Results

A team from Facebook AI Research, UC Berkeley and UCLA proposes Online Decision Transformers (ODT), an RL algorithm based on sequence modelling that incorporates offline pretraining and online finetuning in a unified framework and achieves performance competitive with the state-of-the-art models on the D4RL benchmark.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google’s MaskGIT Outperforms SOTA Transformer Models on Conditional Image Generation and Accelerates Autoregressive Decoding by up to 64x

A Google Research team proposes Masked Generative Image Transformer (MaskGIT), a novel image synthesis paradigm that uses a bidirectional transformer decoder. MaskGIT significantly outperforms state-of-the-art transformer models on the ImageNet dataset and accelerates autoregressive decoding by up to 64x.

AI Machine Learning & Data Science Research

Google Proposes a ‘Simple Trick’ for Dramatically Reducing Transformers’ (Self-)Attention Memory Requirements

In the new paper Self-attention Does Not Need O(n2) Memory, a Google Research team presents novel and simple algorithms for attention and self-attention that require only constant memory and logarithmic memory and reduce the self-attention memory overhead by 59x for inference and by 32x for differentiation at a sequence length of 16384.

AI Machine Learning & Data Science Research

DeepMind’s RETRO Retrieval-Enhanced Transformer Retrieves from Trillions of Tokens, Achieving Performance Comparable to GPT-3 With 25× Fewer Parameters

A DeepMind research team proposes RETRO (Retrieval-Enhanced Transformer), an enhanced auto-regressive language model that conditions on document chunks retrieved from a large corpus and achieves performance comparable to GPT-3 and Jurassic-1 on the Pile dataset while using 25× fewer parameters.

AI Machine Learning & Data Science Research

Warsaw U, Google & OpenAI’s Terraformer Achieves a 37x Speedup Over Dense Baselines on 17B Transformer Decoding

In the new paper Sparse is Enough in Scaling Transformers, a research team from the University of Warsaw, Google Research and OpenAI proposes Scaling Transformers, a family of novel transformers that leverage sparse layers to scale efficiently and perform unbatched decoding much faster than original transformers, enabling fast inference on long sequences even with limited memory.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Softmax-free Vision Transformer With Linear Complexity: Achieving a Superior Accuracy/Complexity Trade-off

Researchers from Fudan University, University of Surrey and Huawei Noah’s Ark Lab identify the limitations of quadratic complexity for vision transformers (ViTs) as rooted in keeping the softmax self-attention during approximations. The team proposes the first softmax-free transformer (SOFT), which reduces the self-attention computation to linear complexity, achieving a superior trade-off between accuracy and complexity.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Mention Memory: Incorporating Factual Knowledge From Various Sources Into Transformers Without Supervision

A research team from the University of Southern California and Google proposes TOME, a “mention memory” approach to factual knowledge extraction for NLU tasks. A transformer model with attention over a semi-parametric representation of the entire Wikipedia text corpus, TOME can extract information without supervision and achieves strong performance on multiple open-domain question answering benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

NYU & UNC Reveal How Transformers’ Learned Representations Change After Fine-Tuning

In the paper Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers, a research team from New York University and the University of North Carolina at Chapel Hill uses centered kernel alignment (CKA) to measure the similarity of representations across layers and explore how fine-tuning changes transformers’ learned representations.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Google Researchers Enable Transformers to Solve Compositional NLP Tasks

A Google Research team explores the design space of Transformer models in an effort to enable deep learning architectures to solve compositional tasks. The proposed approach provides models with inductive biases via design decisions that significantly impact compositional generalization, and achieves state-of-the-art results on the COGS and PCFG composition benchmarks.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Video Swin Transformer Improves Speed-Accuracy Trade-offs, Achieves SOTA Results on Video Recognition Benchmarks

A research team from Microsoft Research Asia, University of Science and Technology of China, Huazhong University of Science and Technology, and Tsinghua University takes advantage of the inherent spatiotemporal locality of videos to present a pure-transformer backbone architecture for video recognition that leads to a better speed-accuracy trade-off.

AI Machine Learning & Data Science Research

Pieter Abbeel Team’s Decision Transformer Abstracts RL as Sequence Modelling

A research team from UC Berkeley, Facebook AI Research and Google Brain abstracts Reinforcement Learning (RL) as a sequence modelling problem. Their proposed Decision Transformer simply outputs optimal actions by leveraging a causally masked transformer, yet matches or exceeds state-of-the-art model-free offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.

AI Machine Learning & Data Science Nature Language Tech Research

Study Shows Transformers Possess the Compositionality Power for Mathematical Reasoning

A research team from UC Davis, Microsoft Research and Johns Hopkins University extends work on training massive amounts of linguistic data to reveal the grammatical structures in their representations to the domain of mathematical reasoning, showing that both the standard transformer and the TP-Transformer can compose the meanings of mathematical symbols based on their structured relationships.