Specialized hardware accelerators such as GPUs and TPUs have dramatically sped up neural network training. A problem however is that not all operations in a given training pipeline can run on accelerators, with the earlier training stages particularly prone to bottlenecking.
A team of researchers at Google Brain recently proposed a “data echoing” technique that enables these time-consuming upstream training stages to also benefit from accelerators. Introduced in the paper Faster Neural Network Training with Data Echoing, the technique reuses intermediate outputs from earlier pipeline stages so that idle accelerator capacity can be reclaimed.
“Data echoing can speed up training whenever computation upstream from accelerators dominates training time… Rather than waiting for more data to become available, we simply utilize data that is already available to keep the accelerators busy,” reads a Google blog post explaining the technique.
The researchers take the training pipeline for ResNet-50 on ImageNet as a typical example of large-scale computer vision programs. Before applying a mini-batch stochastic gradient descent (SGD) update, the various stages — including from Read and decode to Batch — cannot benefit from the acceleration brought by specialized hardware accelerators. To reclaim idle accelerator capacity, rather than waiting for more data, the proposed data echoing approach inserts a state in the pipeline that repeats (echoes) data from the previous stage. “Once a practitioner identifies the largest bottleneck in the training pipeline, they can insert an echoing stage after it to reclaim idle accelerator capacity,” the researchers note.
The data echoing method was examined over various datasets on two language modelling tasks, two image classification tasks, and one object detection task. In all cases it reduced the amount of upstream computation needed to reach a better out-of-sample error. The researchers for example found that data echoing achieved a 3x training speed up with a ResNet-50 model trained on the ImageNet dataset.
The Google Brain team believes data echoing can prove a simple and effective strategy for speeding up training by increasing hardware utilization when the training pipeline has a bottleneck in upstream stages.
The paper Faster Neural Network Training with Data Echoing is on arXiv.
Author: Fangyu Cai | Editor: Michael Sarazen