The application of deep neural networks (DNNs) in AI can involve millions of data entries and complicated model training to achieve state-of-the-art performance. Finding ways to increase DNN training efficiency has become a critical challenge.
A group of researchers from Tencent Technology, the Chinese University of Hong Kong, and Nankai University recently proposed a new training method to tackle the challenge. Theycombined two commonly used techniques — Batch Normalization (BatchNorm) and Dropout — into an Independent Component (IC) layer inserted before each weight layer to make inputs more independent.
The work “is based on an excellent idea that whitening the inputs of neural networks can achieve a fast convergence speed.” Whitening is a preprocessing technique that seeks to make data less correlated and standardize variance. Previous attempts to leverage whitening at every activation layer were computationally expensive and eventually led to the usage of BatchNorm as an input normalization technique. BatchNorm has however since drifted away from its goal of whitening, and the new research proposes a way for it to refocus on that.
Researchers combined BatchNorm with the regularization technique Dropout to construct independent activations for neurons in each intermediate weight layer. To overcome the computational complexity involved in determining independent components, researchers used BatchNorm to replace ZCA (zero-phase component analysis), which serves as the first step for the ICA (independent component analysis) methods, but is computationally expensive.
High computation cost is hindering the development of wide neural networks where many neurons often exist in an intermediate layer. Researchers used Dropout to replace the rotation step in their novel IC layer. Dropout introduces independent random gates for the neuron in a layer and improved convergence speed when added to DNN training.
Evaluations performed on the CIFAR10/100 and ILSVRC2012 datasets showed the method’s implementation improves classification performance of new networks in three aspects: “ i) more stable training process, ii) faster convergence speed, and iii) better convergence limit.”
The researchers also suggested they are considering utilizing more advanced normalization methods such as layer normalization, instance normalization, and group normalization in IC layers in the future.
The paper Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen