530 Billion Parameters! Microsoft and NVIDIA Trained the Largest Generative Language Model

On October 11, Microsoft introduced the largest and “the most powerful monolithic transformer language model” trained to date, a 530 billion parameter GPT-3-style generative language model.

by Synced

2021-10-11

Comments 7

On October 11, Microsoft introduced the largest and “the most powerful monolithic transformer language model” trained to date, a 530 billion parameter GPT-3-style generative language model.

As the result of a research collaboration between Microsoft and NVIDIA, the model was dubbed “Megatron-Turing Natural Language Generation model (MT-NLG).” More specifically, the team of researchers developed “an efficient and scalable 3D parallel system capable of combining data, pipeline, and tensor-slicing based parallelism” to “further parallelize and optimize the training of very large AI models.”

Microsoft previously launched the new deep learning optimization library called DeepSpeed to enable the training of 100-billion-parameter models. And Megatron-LM is a large transformer language model developed by the Applied Deep Learning Research team at NVIDIA.

530 Billion Parameters! Microsoft and NVIDIA Trained the Largest Generative Language Model

Like this:

7 comments on “530 Billion Parameters! Microsoft and NVIDIA Trained the Largest Generative Language Model”

Leave a Reply Cancel reply

Related

Share this:

Like this:

7 comments on “530 Billion Parameters! Microsoft and NVIDIA Trained the Largest Generative Language Model”

Leave a Reply Cancel reply

Related