Microsoft & NVIDIA Leverage DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest Monolithic Language Model
A research team from Microsoft and NVIDIA leverages the NVIDIA Megatron-LM and Microsoft’s DeepSpeed to create an efficient and scalable 3D parallel system that combines data, pipeline, and tensor-slicing based parallelism, achieving superior zero-, one-, and few-shot learning accuracies and new state-of-the-art results on NLP benchmarks.