Tag: parallel computing

AI Machine Learning & Data Science Research

Introducing Alpa: A Compiler Architecture for Automated Model-Parallel Distributed Training That Outperforms Hand-Tuned Strategies

A research team from UC Berkeley, Amazon Web Services, Google, Shanghai Jiao Tong University and Duke University proposes Alpa, a compiler system for distributed deep learning on GPU clusters that automatically generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on the models they were designed for.

AI Machine Learning & Data Science Nature Language Tech Research

Microsoft & NVIDIA Leverage DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest Monolithic Language Model

A research team from Microsoft and NVIDIA leverages the NVIDIA Megatron-LM and Microsoft’s DeepSpeed to create an efficient and scalable 3D parallel system that combines data, pipeline, and tensor-slicing based parallelism, achieving superior zero-, one-, and few-shot learning accuracies and new state-of-the-art results on NLP benchmarks.

AI Machine Learning & Data Science Research

Google Presents New Parallelization Paradigm GSPMD for common ML Computation Graphs: Constant Compilation time with Increasing Devices

A research team from Google proposes GSPMD, an automatic parallelism system for ML computation graphs that uses simple tensor sharding annotations to achieve different parallelism paradigms in a unified way, including data parallelism, within-layer model parallelism, spatial partitioning, weight-update sharding, optimizer-state sharding and pipeline parallelism.

AI Machine Learning & Data Science Popular Research

NVIDIA, Stanford & Microsoft Propose Efficient Trillion-Parameter Language Model Training on GPU Clusters

A research team from NVIDIA, Stanford University and Microsoft Research propose a novel pipeline parallelism approach that improves throughput by more than 10 percent with a comparable memory footprint, showing such strategies can achieve high aggregate throughput while training models with up to a trillion parameters.