Nvidia’s paper Large Scale Language Modeling: Converging on 40GB of Text in Four Hours introduces a model that uses mixed precision arithmetic and a 32k batch size distributed across 128 Nvidia Tesla V100 GPUs to improve scalability and transfer in Recurrent Neural Networks (RNNs) for Natural Language tasks.
The model trained a multiplicative Long Short-Term Memory (mLSTM) for unsupervised reconstruction over three epochs of the 40GB Amazon review dataset in just four hours. Previously, training a single epoch of the dataset would have taken about a month. The model cut training time by enabling a single GPU to process significantly more training data.
The team also trained a 8192 neuron mLSTM capable of beating state-of-the-art performance in Amazon review language modeling with a bits per character (BPC) rate of 1.038 and SST classification accuracy of 93.8 percent.
The paper analyzes distributed data parallelism scales with the larger model, common problems of training with recurrent neural networks (RNNs), and the relationship between dataset size, batch size, and learning rate.
The work can serve as a large-scale unsupervised NLP processing pre-training model for deep learning researchers and commercial applications.
The paper was published Aug 3 and is available on arVix: https://arxiv.org/pdf/1808.01371v1.pdf
Author: Robert Tian | Editor: Michael Sarazen