Tag: Model Scaling

AI Machine Learning & Data Science Research

Microsoft’s LongNet Scales Transformer to One Billion Tokens

In a new paper LongNet: Scaling Transformers to 1,000,000,000 Tokens, a Microsoft research team presents LONGNET, a Transformer variant that successfully scaling sequence to more than 1 billion tokens while maintaining stronger performance and have a linear computation complexity.

AI Machine Learning & Data Science Research

Google Presents New Parallelization Paradigm GSPMD for common ML Computation Graphs: Constant Compilation time with Increasing Devices

A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.