A quirk in convolutional neural networks (CNNs) trained to recognize images is that they tend to over-rely on textural information at the expense of shape information during processing. Now, researchers from the University of Toronto, Mila, Nvidia and Google Brain have proposed a new training scheme that targets this bias by controlling and exposing textural information slowly through the training process. The associated paper, Curriculum By Texture, is currently under review by the 2020 International Conference on Machine Learning (ICML).
CNNs have been very successful in computer vision (CV) tasks such as image classification and segmentation. To achieve their effectiveness, they show a necessary bias towards spatial equivariance, which enables them to study more detailed information. CNNs however also show an inductive bias towards textural information due to its high frequency, and this can lead to omission of the smaller group of helpful shape information when making predictions. In order to balance CNN focus on both textural and shape information, researchers used a “curriculum learning scheme” that constrained the input of textural information at the beginning of the training process, forcing the networks to utilize low-frequency information for optimization. Gradually distributing the textural information is shown to improve the CNN performance.
Specifically, researchers applied a Gaussian kernel on each convolutional layer to allocate textural information by annealing the standard deviation parameter of the Gaussian kernels. Without adding any trainable parameters to the process, researchers were able to improve the performance of CNNs on standard vision tasks. This simple method is also generic and applicable to any CNN based neural network.
Researchers conducted experiments on ImageNet and other image datasets to test the effect and feasibility of the proposed method, evaluating models with and without the new feature.
The results show the new method consistently outperforming state-of-the-art trained models on the ImageNet dataset. The new method also proves to be effective across non-ImageNet datasets, and shows superior performance on segmentation and object detection tasks.
The researchers conclude that the Curriculum By Texture approach can train CNNs that perform better on image classification tasks, perform better generalization when used as feature extractors on unseen datasets, and can better adapt to different downstream tasks.
The proposed method demonstrates the possibility of simply altering a training procedure to achieve better performance without changing model structures. Moreover, the method is also universal. The authors believe future research could explore the fundamental reasons for texture bias, training automatic kernel layers, and possibly applying a similar technique to self-attention.
The paper Curriculum By Texture is on arXiv.
Author: Reina Qi Wan | Editor: Michael Sarazen