Transformer | Synced

by Synced 2024-12-28 34

Llama 3 Meets MoE: Pioneering Low-Cost High-Performance AI

Researchers from the University of Texas at Austin and NVIDIA proposes upcycling approach, an innovative training recipe enables the development of an 8-Expert Top-2 MoE model using Llama 3-8B with less than 1% of the compute typically required for pre-training.

by Synced 2024-12-26 21

AI Machine Learning & Data Science Research

DeepMind’s JetFormer: Unified Multimodal Models Without Modelling Constraints

A DeepMind research team introduces JetFormer, a Transformer designed to directly model raw data. This model maximizes the likelihood of raw data without depending on any pre-trained components, and is capable of both understanding and generating text and images seamlessly.

by Synced 2024-12-23 25

AI Machine Learning & Data Science Research

NVIDIA’s nGPT: Revolutionizing Transformers with Hypersphere Representation

An NVIDIA research team proposes the normalized Transformer, which consolidates key findings in Transformer research under a unified framework, offering faster learning and reduced training steps—by factors ranging from 4 to 20 depending on sequence length.

by Synced 2024-11-17 5

AI Machine Learning & Data Science Research

NVIDIA’s OMCAT: A Breakthrough in Cross-Modal Temporal Understanding for Multimodal AI

An NVIDIA research team introduces OMCAT: Omni Context Aware Transformer in their new paper, presenting both OCTAV, a unique dataset aimed at capturing event transitions across audio and video, and OMCAT, a model that employs RoTE (Rotary Time Embeddings).

by Synced 2024-09-09 634

AI Machine Learning & Data Science Research

Microsoft’s Fully Pipelined Distributed Transformer Processes 16x Sequence Length with Extreme Hardware Efficiency

A Microsoft research team introduces the Fully Pipelined Distributed Transformer, which leverages the multiple memory hierarchies available in modern GPU clusters, enhancing hardware efficiency and cost-effectiveness while achieving exceptionally high Model FLOPs Utilization (MFU).

by Synced 2024-03-31 2

AI Machine Learning & Data Science Research

KCL Leverages Topos Theory to Decode Transformer Architectures

A King’s College London research team delves into a theoretical exploration of the transformer architecture, employing the lens of topos theory. This innovative approach conjectures that the factorization through “choose” and “eval” morphisms can yield effective neural network architecture designs.

by Synced 2024-03-29 3

AI Machine Learning & Data Science Research

Robotic Marvels: Conquering San Francisco’s Streets Through Next Token Prediction

A research team from University of California, Berkeley presents a causal transformer model trained via autoregressive prediction of sensorimotor trajectories, culminating in the remarkable feat of enabling a full-sized humanoid to navigate the streets of San Francisco in a zero-shot manner.

by Synced 2024-03-11 5

AI Machine Learning & Data Science Research

Fast Tracks to Diverse Behaviors: VQ-BeT Achieves 5x Speed Surge Compared to Diffusion Policies

In a new paper Behavior Generation with Latent Actions, a research team introduces the Vector-Quantized Behavior Transformer (VQ-BeT), an innovative model offers a solution for behavior generation, addressing multimodal action prediction, conditional generation, and partial observations.

by Synced 2023-10-31 6

AI Machine Learning & Data Science Research

Supercharging Large Language Models: DEJAVU’s Inference Time Surpasses FasterTransformer by 2×

In a new paper Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time, a research team presents DEJAVU, a system that employs a cost-effective algorithm to predict contextual sparsity dynamically for each layer, combined with an asynchronous and hardware-aware implementation to accelerate LLM inference.

by Synced 2023-10-17 2

AI Machine Learning & Data Science Research

MatFormer: The Universal Elastic Transformer Capable to Generate Submodels With Zero Extra Training Costs

In a new paper MatFormer: Nested Transformer for Elastic Inference, a research team proposes MatFormer, a Transformer architecture that is inherently designed for elasticity, enables the training of a single universal model capable of generating numerous smaller submodels without the need for additional training.

by Synced 2023-09-20 2

AI Machine Learning & Data Science Nature Language Tech Research

Unveiling the Enigma: Meta AI & UPC Decodes the Inner Workings of Large Scale Language Models

In a new paper Neurons in Large Language Models: Dead, N-gram, Positional, a research team from Meta AI and Universitat Politècnica de Catalunya conducts comprehensive analysis of a family of Open Pre-trained Transformer Language Models (OPT) up to 66b parameters to provide insights of how feed-forward network (FFN) layers act.

by Synced 2023-06-13 5

AI Machine Learning & Data Science Research

Salesforce AI’s CodeTF Library Facilitates Easy LLM Integration for Code Intelligence Tasks

In a new paper CodeTF: One-stop Transformer Library for State-of-the-art Code LLM, a Salesforce AI research team develop CodeTF, an open-source one-stop comprehensive Python library that provides a seamless interface for training and inferencing on code intelligence tasks, aiming to facilitate easy integration of state-of-the-art language models into real-world applications.

by Synced 2022-10-04 6

AI Machine Learning & Data Science Research

UNC Chapel Hill’s Textless Vision-Language Transformer: Comparable Performance to Text-Based Approaches but 28x Faster

In the new paper TVLT: Textless Vision-Language Transformer, researchers from UNC Chapel Hill present the Textless Vision-Language Transformer (TVLT) for vision-and-language representation learning. TVLT uses only raw visual and audio inputs and performs comparably to its text-based counterparts but requires only 1/3 the parameters and achieves 28x faster inference speeds.

by Synced 2022-06-29 181

AI Computer Vision & Graphics Machine Learning & Data Science Research

NVIDIA’s Global Context ViT Achieves SOTA Performance on CV Tasks Without Expensive Computation

In the new paper Global Context Vision Transformers, an NVIDIA research team proposes the Global Context Vision Transformer, a novel yet simple hierarchical ViT architecture comprising global self-attention and token generation modules that enables the efficient modelling of both short- and long-range dependencies without costly compute operations while achieving SOTA results across various computer vision tasks.

by Synced 2022-02-24 0

AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind’s Upgraded Hierarchical Perceiver Is Faster, Scales to Larger Data Without Preprocessing, and Delivers Higher Resolution and Accuracy

DeepMind researchers propose Hierarchical Perceiver (HiP), a model that retains the original Perceiver’s ability to process arbitrary modalities but is faster, can scale up to even more inputs/outputs, reduces the need for input engineering, and improves both efficiency and accuracy on classical computer vision benchmarks.

by Synced 2022-02-22 9

AI Machine Learning & Data Science Research

DeepMind Trains Agents to Control Computers as Humans Do to Solve Everyday Tasks

DeepMind trains agents to use keyboard and mouse commands with pixel and Document Object Model (DOM) observations to control computers, achieving state-of-the-art and human-level mean performance across all tasks on the MiniWob++ benchmark.

by Synced 2022-02-03 0

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft & NVIDIA Leverage DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest Monolithic Language Model

A research team from Microsoft and NVIDIA leverages the NVIDIA Megatron-LM and Microsoft’s DeepSpeed to create an efficient and scalable 3D parallel system that combines data, pipeline, and tensor-slicing based parallelism, achieving superior zero-, one-, and few-shot learning accuracies and new state-of-the-art results on NLP benchmarks.

by Synced 2022-01-24 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Meta AI’s OMNIVORE: A Modality-Agnostic Single Vision Model With Cross-Modal Generalization

A Meta AI research team presents OMNIVORE, a single vision model for various visual modalities that can perform cross-modal generalization and achieves performance at par or better than traditional modality-specific models of the same size.

by Synced 2022-01-06 2

AI Machine Learning & Data Science Nature Language Tech Research

University of Amsterdam & Meta AI Propose a Roadmap Toward Interactive Language Modelling Based on Caregiver-Child Interactions

In the new paper Towards Interactive Language Modeling, a research team from the University of Amsterdam and Meta AI Labs presents a road map detailing the steps to be taken towards interactive language modelling.

by Synced 2021-12-29 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

ETH Zurich Proposes Exemplar Transformers: Robust Visual Tracking That’s 8x Faster and CPU-Compatible

In the new paper Efficient Visual Tracking with Exemplar Transformers, ETH Zurich researchers propose Exemplar Transformers for real-time visual object tracking that’s up to 8× faster than other transformer-based models.

by Synced 2021-11-30 0

AI Machine Learning & Data Science Popular Research

Google, Cambridge U & Alan Turing Institute Propose PolyViT: A Universal Transformer for Image, Video, and Audio Classification

A research team from Google Research, University of Cambridge and Alan Turing Institute proposes PolyViT, a single transformer model capable of processing multiple modalities and datasets. PolyViT is parameter-efficient and learns representations that generalize across multiple domains.

by Synced 2021-10-11 7

AI Global News Hot Industry Nature Language Tech Research US & Canada

530 Billion Parameters! Microsoft and NVIDIA Trained the Largest Generative Language Model

On October 11, Microsoft introduced the largest and “the most powerful monolithic transformer language model” trained to date, a 530 billion parameter GPT-3-style generative language model.

by Synced 2021-09-07 2

AI Machine Learning & Data Science Research

Swiss AI Lab Uses Simple Tricks to Dramatically Improve Transformers’ Systematic Generalization

A research team from The Swiss AI Lab IDSIA significantly improves the systematic generalization of transformer architectures, achieving accuracy up to 85 percent on the PCFG productivity split, and up to 81 percent on COGS.

by Synced 2021-08-30 4

AI Machine Learning & Data Science Popular Research

Tsinghua U & Microsoft Propose Fastformer: An Additive Attention Based Transformer With Linear Complexity

A team from Tsinghua University and Microsoft Research Asia proposes Fastformer, an efficient Transformer variant based on additive attention that achieves effective context modelling with linear complexity.

by Synced 2021-08-12 2

AI Machine Learning & Data Science Nature Language Tech Research

WMT21 | Detailing WeChat AI & Beijing Jiaotong University’s NMT System Architecture

On August 5, WeChat AI and Beijing Jiaotong University system developers released the paper WeChat Neural Machine Translation Systems for WMT21, revealing the architecture of their novel neural machine translation (NMT) system and the strategies they adopted to achieve impressive performance in the WMT21 competition.

by Synced 2021-08-05 3

AI Machine Learning & Data Science Nature Language Tech Research

Google’s H-Transformer-1D: Fast One-Dimensional Hierarchical Attention With Linear Complexity for Long Sequence Processing

A Google Research team draws inspiration from two numerical analysis methods — Hierarchical Matrix (H-Matrix) and Multigrid — to address the quadratic complexity problem of attention mechanisms in transformer architectures, proposing a hierarchical attention scheme that has linear complexity in run time and memory.

by Synced 2021-06-03 2

AI Machine Learning & Data Science Nature Language Tech Research

Towards a Token-Free Future: Google Proposes Pretrained Byte-to-Byte Transformers for NLP

A research team from Google proposes ByT5 architecture, a competitive token-free pretrained byte-to-byte transformer that can be straightforwardly adapted to process byte sequences without adding excessive computational cost.

by Synced 2021-05-14 9

AI Machine Learning & Data Science Popular Research

Google Replaces BERT Self-Attention with Fourier Transform: 92% Accuracy, 7 Times Faster on GPUs

A research team from Google shows that replacing transformers’ self-attention sublayers with Fourier Transform achieves 92 percent of BERT accuracy on the GLUE benchmark with training times seven times faster on GPUs and twice as fast on TPUs.

by Synced 2021-05-12 1

AI Machine Learning & Data Science Research

DeepMind & Onshape Leverage Transformer to Automatize Effective CAD Sketches

A research team from DeepMind and Onshape combines a general-purpose language modelling technique and an off-the-shelf data serialization protocol to propose a machine learning model that can automatically generate high-quality sketches for Computer-Aided Design.

by Synced 2021-04-27 2

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft & Peking U Researchers Identify ‘Knowledge Neurons’ in Pretrained Transformers, Enabling Fact Editing

A research team from Microsoft Research and Peking University peeps into pretrained transformers and investigates how factual knowledge is stored, proposing a method to identify “knowledge neurons,” which can be utilized to explicitly update and erase facts.

by Synced 2021-04-26 2

AI Machine Learning & Data Science Research

Google and UC Berkeley Propose Green Strategies for Large Neural Network Training

A research team from Google and the University of California, Berkeley calculates the energy use and carbon footprint of large-scale models T5, Meena, GShard, Switch Transformer and GPT-3, and identifies methods and publication guidelines that could help reduce their CO2e footprint.

by Synced 2021-03-08 1

AI Computer Vision & Graphics Research

Meet Transformer in Transformer: A Visual Transformer That Captures Structural Information From Images

A team from Huawei, ISCAS and UCAS propose the novel Transformer-iN-Transformer (TNT) for modelling both patch-level and pixel-level representations.

by Synced 2021-02-26 2

AI Machine Learning & Data Science Research

Facebook AI’s Multitask & Multimodal Unified Transformer: A Step Toward General-Purpose Intelligent Agents

A research team from Facebook AI has proposed a Unified Transformer (UniT) encoder-decoder model that jointly trains on multiple tasks across different modalities and achieves strong performance on seven tasks with a unified set of model parameters.

by Synced 2021-02-17 3

AI Others Research United States

Yann LeCun Hails MSA Transformer’s ‘Huge Progress’ in Protein Contact Prediction

UC Berkeley, Facebook AI Research and New York University researchers’ Multiple Sequence Alignments (MSA) Transformer surpasses current state-of-the-art unsupervised structure learning methods by a wide margin.

by Synced 2021-01-14 6

Machine Learning & Data Science Nature Language Tech Popular Research

Google Brain’s Switch Transformer Language Model Packs 1.6-Trillion Parameters

Google Brain’s Switch Transformer language model packs a whopping 1.6 trillion parameters while effectively controlling computational cost. The model achieved a 4x pretraining speedup over a strongly tuned T5-XXL baseline.

by Synced 2020-11-12 3

Machine Learning & Data Science Popular

Google & DeepMind Debut Benchmark for Long-Range Transformers

Google Research and DeepMind debut Long-Range Arena (LRA) benchmark for Transformer research on tasks with long sequence lengths.

by Synced 2020-11-02 2

AI

‘Bridging Visual Representations’ Decoder Integrates CV Object Detection Frameworks

NeurIPS 2020 Institute of Automation CAS and Microsoft Research Asia paper presents an attention-based decoder that integrates CV object representations

by Synced 2020-10-26 3

Machine Learning & Data Science Nature Language Tech

Google ‘mT5’ Pretrained Text-to-Text Transformer Achieves SOTA Performance on Multilingual Benchmarks

Google recently introduced mT5, a multilingual variant of its “Text-to-Text Transfer Transformer” (T5), pretrained on a new Common Crawl-based dataset covering 101 languages.

by Synced 2020-10-02 12

Machine Learning & Data Science Nature Language Tech Popular

Google, Cambridge, DeepMind & Alan Turing Institute’s ‘Performer’ Transformer Slashes Compute Costs

A team from Google, University of Cambridge, DeepMind, and Alan Turing Institute have proposed a new type of Transformer dubbed Performer, based on a Fast Attention Via positive Orthogonal Random features (FAVOR+) backbone mechanism.

by Synced 2020-09-10 4

Machine Learning & Data Science Nature Language Tech Popular

OpenAI ‘GPT-f’ Delivers SOTA Performance in Automated Mathematical Theorem Proving

OpenAI researchers introduce GPT-f, an automated prover and proof assistant for the Metamath formalization language.