Tag: Transformers

AI Machine Learning & Data Science Research

Forget About Catastrophic Forgetting: Google’s Continual HyperTransformer Enables Efficient Continual Few-Shot Learning

In the new paper Continual Few-Shot Learning Using HyperTransformers, a Google Research team proposes Continual HyperTransformer, which modifies the recently published HyperTransformer few-shot learning method to sequentially update a convolutional neural network’s weights based on the information in a new task without forgetting the knowledge it learned from previous tasks.

AI Machine Learning & Data Science Research

Meet Tracr: DeepMind & ETH Zurich’s Novel Interpretability Tool Compiles Human-Readable Code to Transformers’ Weights

In the new paper Tracr: Compiled Transformers as a Laboratory for Interpretability, a research team from ETH Zurich and DeepMind presents Tracr, a compiler that addresses the absence of ground truth explanations in deep neural network models by “compiling” human readable code to the weights of a transformer model.

AI Machine Learning & Data Science Research

Google’s Masked Generative Transformers Achieve SOTA Text-To-Image Performance With Improved Efficiency

In the new paper Muse: Text-To-Image Generation via Masked Generative Transformers, a Google Research team introduces Muse, a transformer-based text-to-image synthesis model that leverages masked image modelling to achieve state-of-the-art performance while being significantly faster than diffusion or autoregressive models.

AI Machine Learning & Data Science Research

Google & Lund U’s Optimus Learned Optimization Architecture Efficiently Captures Complex Dependencies

In the new paper Transformer-Based Learned Optimization, a Google Research and Lund University team presents Optimus, an expressive neural network architecture for learned optimization that captures complex dependencies in the parameter space and achieves competitive results on real-world tasks and benchmark optimization problems.

AI Machine Learning & Data Science Research

Stanford U & Google’s Convex Analytic Training Framework Improves the Understanding and Optimization of Transformers

In the new paper Convexifying Transformers: Improving Optimization and Understanding of Transformer Networks, a Stanford University and Google Research team provides a solid theoretical analysis of transformers’ fundamental mechanisms and introduces a novel convex analytic training framework for improving their optimization.

AI Machine Learning & Data Science Research

‘MrsFormer’ Employs a Novel Multiresolution-Head Attention Mechanism to Cut Transformers’ Compute and Memory Costs

In the new paper Transformers with Multiresolution Attention Heads (currently under double-blind review for ICLR 2023), researchers propose MrsFormer, a novel transformer architecture that uses Multiresolution-head Attention to approximate output sequences and significantly reduces head redundancy without sacrificing accuracy.

AI Machine Learning & Data Science Research

Wider, Not Deeper: Cambridge, Oxford & ICL Challenge Conventional Transformer Design Approaches

In the new paper Wide Attention Is The Way Forward For Transformers, a research team from the University of Cambridge, Imperial College London, and the University of Oxford challenges the commonly held belief that deeper is better for transformer architectures, demonstrating that wider layers result in superior performance on natural language processing tasks.

AI Machine Learning & Data Science Research

Transformers on Edge Devices? Monash U’s Energy-Saving Attention With Linear Complexity Reduces Compute Cost by 73%

In the new paper EcoFormer: Energy-Saving Attention with Linear Complexity, a Monash University research team presents EcoFormer, an attention mechanism with linear complexity that replaces expensive multiply-accumulate operations with simple accumulations and achieves a 73 percent energy footprint reduction on ImageNet.

AI Machine Learning & Data Science Nature Language Tech Research

Peking U & Microsoft’s Knowledge Attribution Method Enables Editing Factual Knowledge in Pretrained Transformers Without Fine-Tuning

In the new paper Knowledge Neurons in Pretrained Transformers, a research team from Peking University and Microsoft Research introduces a knowledge attribution method that identifies the neurons that store factual knowledge in pretrained transformers and leverages these neurons to edit factual knowledge in transformers without any fine-tuning.

AI Machine Learning & Data Science Research

OpenAI Presents a Simple and Efficient Training Strategy to Boost Language Models’ Text-Infilling Capabilities

In the new paper Efficient Training of Language Models to Fill in the Middle, an OpenAI research team shows that causal decoder-based autoregressive (AR) language models can learn to infill texts via a very simple and straightforward transformation to the training data and without any architectural modifications.

AI Computer Vision & Graphics Machine Learning & Data Science Research

IITM & UT Austin’s Generalizable NeRF Transformer Demonstrates Transformers’ Capabilities for Graphical Rendering

In the new paper Is Attention All NeRF Needs?, a research team from the Indian Institute of Technology Madras and the University of Texas at Austin proposes Generalizable NeRF Transformer (GNT), a pure and universal transformer-based architecture for efficient on-the-fly reconstruction of NeRFs. The work demonstrates that a pure attention mechanism suffices for learning a physically-grounded rendering process.

AI Machine Learning & Data Science Research

Google Leverages Transformers to Vastly Simplify Neural Video Compression With SOTA Results

In the new paper VCT: A Video Compression Transformer, a Google Research team presents an elegantly simple but powerful video compression transformer (VCT) that does not require architectural biases and priors and learns totally from data without any hand-crafting. VCT is easy to implement and outperforms conventional video compression approaches.

AI Machine Learning & Data Science Research

Microsoft’s XTC Extreme Lightweight Compression Method for Pretrained Transformers Achieves SOTA Results and 50x Smaller Model Sizes

In the new paper Extreme Compression for Pre-trained Transformers Made Simple and Efficient, a Microsoft research team introduces XTC, a simple yet effective extreme compression pipeline for pretrained transformers that can achieve state-of-the-art results while reducing model size by 50x.

AI Machine Learning & Data Science Research

Tsinghua U & BAAI’s CogView2 Achieves SOTA Competitive Text-to-Image Generation With 10x Speedups

In the new paper CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers, Tsinghua University and the Beijing Academy of Artificial Intelligence researchers pretrain a Cross-Modal general Language Model (CogLM) for text and image token prediction and finetune it for fast super-resolution. The resulting CogView2 hierarchical text-to-image system achieves significant speedups while generating images with better quality at comparable resolutions.

AI Machine Learning & Data Science Nature Language Tech Research

Google, NYU & Maryland U’s Token-Dropping Approach Reduces BERT Pretraining Time by 25%

In the new paper Token Dropping for Efficient BERT Pretraining, a research team from Google, New York University, and the University of Maryland proposes a simple but effective “token dropping” technique that significantly reduces the pretraining cost of transformer models such as BERT without hurting performance on downstream fine-tuning tasks.

AI Machine Learning & Data Science Research

Google Extends Transformers for Immediate Knowledge Acquisition via a Simple New Data Read & Memorize Technique

A Google research team addresses conventional transformers’ resource-heavy training and fine-tuning requirements for learning new knowledge, proposing Memorizing Transformers as a step toward language models that can simply read and memorize new data at inference time for immediate knowledge acquisition.

AI Machine Learning & Data Science Nature Language Tech Research

Google & IDSIA’s Block-Recurrent Transformer Dramatically Outperforms Transformers Over Very Long Sequences

A team from Google Research and the Swiss AI Lab IDSIA proposes the Block-Recurrent Transformer, a novel long-sequence processing approach that has the same computation time and parameter count costs as a conventional transformer layer but achieves significant perplexity improvements in language modelling tasks over very long sequences.

AI Machine Learning & Data Science Research

Transformers Meet Online RL: New Study Unifies Offline Pretraining and Online Finetuning, Achieves SOTA Results

A team from Facebook AI Research, UC Berkeley and UCLA proposes Online Decision Transformers (ODT), an RL algorithm based on sequence modelling that incorporates offline pretraining and online finetuning in a unified framework and achieves performance competitive with the state-of-the-art models on the D4RL benchmark.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google’s MaskGIT Outperforms SOTA Transformer Models on Conditional Image Generation and Accelerates Autoregressive Decoding by up to 64x

A Google Research team proposes Masked Generative Image Transformer (MaskGIT), a novel image synthesis paradigm that uses a bidirectional transformer decoder. MaskGIT significantly outperforms state-of-the-art transformer models on the ImageNet dataset and accelerates autoregressive decoding by up to 64x.

AI Machine Learning & Data Science Research

Google Proposes a ‘Simple Trick’ for Dramatically Reducing Transformers’ (Self-)Attention Memory Requirements

In the new paper Self-attention Does Not Need O(n2) Memory, a Google Research team presents novel and simple algorithms for attention and self-attention that require only constant memory and logarithmic memory and reduce the self-attention memory overhead by 59x for inference and by 32x for differentiation at a sequence length of 16384.

AI Machine Learning & Data Science Research

DeepMind’s RETRO Retrieval-Enhanced Transformer Retrieves from Trillions of Tokens, Achieving Performance Comparable to GPT-3 With 25× Fewer Parameters

A DeepMind research team proposes RETRO (Retrieval-Enhanced Transformer), an enhanced auto-regressive language model that conditions on document chunks retrieved from a large corpus and achieves performance comparable to GPT-3 and Jurassic-1 on the Pile dataset while using 25× fewer parameters.

AI Machine Learning & Data Science Research

Warsaw U, Google & OpenAI’s Terraformer Achieves a 37x Speedup Over Dense Baselines on 17B Transformer Decoding

In the new paper Sparse is Enough in Scaling Transformers, a research team from the University of Warsaw, Google Research and OpenAI proposes Scaling Transformers, a family of novel transformers that leverage sparse layers to scale efficiently and perform unbatched decoding much faster than original transformers, enabling fast inference on long sequences even with limited memory.

AI Computer Vision & Graphics Machine Learning & Data Science Research

Softmax-free Vision Transformer With Linear Complexity: Achieving a Superior Accuracy/Complexity Trade-off

Researchers from Fudan University, University of Surrey and Huawei Noah’s Ark Lab identify the limitations of quadratic complexity for vision transformers (ViTs) as rooted in keeping the softmax self-attention during approximations. The team proposes the first softmax-free transformer (SOFT), which reduces the self-attention computation to linear complexity, achieving a superior trade-off between accuracy and complexity.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Mention Memory: Incorporating Factual Knowledge From Various Sources Into Transformers Without Supervision

A research team from the University of Southern California and Google proposes TOME, a “mention memory” approach to factual knowledge extraction for NLU tasks. A transformer model with attention over a semi-parametric representation of the entire Wikipedia text corpus, TOME can extract information without supervision and achieves strong performance on multiple open-domain question answering benchmarks.

AI Machine Learning & Data Science Nature Language Tech Research

NYU & UNC Reveal How Transformers’ Learned Representations Change After Fine-Tuning

In the paper Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers, a research team from New York University and the University of North Carolina at Chapel Hill uses centered kernel alignment (CKA) to measure the similarity of representations across layers and explore how fine-tuning changes transformers’ learned representations.

AI Machine Learning & Data Science Nature Language Tech Popular Research

Google Researchers Enable Transformers to Solve Compositional NLP Tasks

A Google Research team explores the design space of Transformer models in an effort to enable deep learning architectures to solve compositional tasks. The proposed approach provides models with inductive biases via design decisions that significantly impact compositional generalization, and achieves state-of-the-art results on the COGS and PCFG composition benchmarks.