Tag: Model Scaling

AI Machine Learning & Data Science Nature Language Tech Research

Google’s Transformer-Based LongT5 Achieves Performance Gains by Scaling Both Input Length and Model Size

A Google Research team explores the effects of scaling both input length and model size at the same time with LongT5, a novel transformer architecture that achieves state-of-the-art performance on long-sequence tasks.

AI Machine Learning & Data Science Research

Google Presents New Parallelization Paradigm GSPMD for common ML Computation Graphs: Constant Compilation time with Increasing Devices

A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.