Inductive bias is a critical factor in the application of machine learning to a given task, and has produced strong image recognition results in convolutional networks. Recent advancements have shifted machine learning from the traditional approach of leveraging expert knowledge as an explicit inductive bias to using general-purpose methods that learn biases from data.
While researchers have focused on learning convolution-like structures from scratch to move forward in this regard, they face a dilemma due to a limited understanding of the inductive bias that gives rise to convolutions. How to reduce inductive bias while making sure this won’t hurt model efficiency? Is it possible to keep only the core bias to deliver high performance?
Google Senior Research Scientist Behnam Neyshabur recently offered his insights on the topic in the paper Towards Learning Convolutions from Scratch.
In machine learning, inductive bias refers to assumptions that the learning algorithm uses to predict the outputs for inputs it has not previously encountered. As reducing inductive bias requires more data, more computation, and larger models, Neyshabur was inspired to consider replacing convolutional networks with fully-connected networks with the same expressive capacity. To study only the effects of convolutions in modern architectures without regard for the influence of other components such as pooling and residual connections, he introduced shallow (s-conv) and deep (d-conv) all-convolutional networks with both locally-connected and fully-connected networks — the two large hypothesis classes in convolutional networks.
Through empirical experiments on s-conv and d-conv along with their locally connected and fully-connected counterparts, Neyshabur further examined the role of depth, local connectivity, and weight sharing in convolutions, observing:
- Local connectivity appears to have the greatest influence on performance.
- The main benefit of depth appears to be efficiency in terms of memory and computation. Consequently, training shallow architectures with many more parameters for a long time would compensate most of the lost performance due to lack of depth.
- The benefit of depth diminishes even further without weight-sharing
Based on the assumption that simpler hypotheses are more likely to be true, minimum description length (MDL) is one of the common inductive biases in machine learning algorithms. Neyshabur turned to MDL as a guiding principle to find architectures with small description lengths. He proposed a simple variant of the lasso algorithm β-lasso that, “when applied on fully-connected networks for image classification tasks, learns architectures with local connections.” The fully-connected networks trained with β-lasso achieved state-of-the-art accuracy on CIFAR-10 (85.19 percent), CIFAR-100 (59.56 percent), and SVHN (94.07 percent).
Unlike previous work that suggested fully-connected networks can only deliver subpar performance, Neyshabur’s experiments showed a significant performance jump. “The results are on par with the reported performance of convents around year 2013. Also, unlike convolutional networks, these results are invariant to permuting pixels!” Neyshabur tweeted.
The paper Towards Learning Convolutions from Scratch is available on arXiv.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how the Chinese government and business owners have leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.