The performance of deep neural networks (DNNs) relies heavily on their structures, and designing a good structure (aka architecture) tends to require extensive effort from human experts. The idea of an automatic structure-learning algorithm that can achieve performance on par with the best human-designed structures is thus increasingly appealing to machine learning researchers.
In the paper Learning Structures for Deep Neural Networks, a team from OneFlow and Microsoft explores unsupervised structure learning, leveraging the efficient coding principle, information theory and computational neuroscience to design a structure learning method that does not require labelled information and demonstrates empirically that larger entropy outputs in a deep neural network lead to better performance.
The researchers start with the assumption that the optimal structure of neural networks can be derived from the input features without labels. Their study probes whether it is possible to learn good DNN network structures from scratch in a fully automatic fashion, and what would be a principled way to reach this end.
The team references a principle borrowed from the biological nervous system domain — the efficient coding principle — which posits that a good brain structure “forms an efficient internal representation of external environments.” They apply the efficient coding principle to DNN architecture, proposing that the structure of a well-designed network should match the statistical structure of its input signals.
The efficient coding principle suggests that the mutual information between a model’s inputs and outputs should be maximized, and the team presents a solid Bayesian optimal classification theoretical foundation to support this. Specifically, they show that the top layer of any neural network (softmax linear classifier) and the independency between the nodes in the top hidden layer constitute a sufficient condition for making the softmax linear classifier act as a Bayesian optimal classifier. This theoretical foundation not only backs up the efficient coding principle, it also provides a way to determine the depth of a DNN.
The team then investigates how to leverage the efficient coding principle in the design of a structure-learning algorithm, and shows that sparse coding can implement the principle under the assumption of zero-peaked and heavy-tailed prior distributions. This suggests that an effective structure learning algorithm can be designed based on global group sparse coding.
The proposed structure-learning with sparse coding algorithm learns a structure layer by layer in a bottom-up manner. The raw features are at layer one, and given the predefined number of nodes in layer two, the algorithm will learn the connection between these two layers, and so on.
The researchers also describe how this proposed algorithm can learn inter-layer connections, handle invariance, and determine DNN depth. Finally, they conduct intensive experiments on the popular CIFAR-10 data set to evaluate the classification accuracies of their proposed structure learning method, the role of inter-layer connections, and the role of structure masks and network depth.
The results show that a learned-structure single-layer network achieves an accuracy of 63.0 percent, outperforming the single-layer baseline of 60.4 percent. In an inter-layer connection density evaluation experiment, the structures generated by the sparse coding approach outperform random structures, and at the same density level, always outperform the sparsifying-restricted Boltzmann machines (RBM) baseline. In the team’s structure mask role evaluation, the structure prior provided by sparse coding is seen to improve performance. The network depth experiment meanwhile empirically justifies the proposed approach for determining DNN depth via coding efficiency.
Overall, the research proves the efficient coding principle’s effectiveness for unsupervised structure learning, and that the proposed global sparse coding-based structure-learning algorithms can achieve performance comparable with the best human-designed structures.
The paper Learning Structures for Deep Neural Networks is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.