Model Scaling | Synced

by Synced 2023-07-10 17

Microsoft’s LongNet Scales Transformer to One Billion Tokens

In a new paper LongNet: Scaling Transformers to 1,000,000,000 Tokens, a Microsoft research team presents LONGNET, a Transformer variant that successfully scaling sequence to more than 1 billion tokens while maintaining stronger performance and have a linear computation complexity.

by Synced 2021-12-21 2

AI Machine Learning & Data Science Nature Language Tech Research

Google’s Transformer-Based LongT5 Achieves Performance Gains by Scaling Both Input Length and Model Size

A Google Research team explores the effects of scaling both input length and model size at the same time with LongT5, a novel transformer architecture that achieves state-of-the-art performance on long-sequence tasks.

by Synced 2021-05-17 0

AI Machine Learning & Data Science Research

Google Presents New Parallelization Paradigm GSPMD for common ML Computation Graphs: Constant Compilation time with Increasing Devices

A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.