The empirical success of deep neural networks (DNNs) has inspired the machine learning research community to initiate theoretical studies on DNN aspects such as learning, optimization and generalization. One of the crucial associated research areas is exploring how to design provably time- and sample-efficient learning algorithms for neural networks — a challenge that remains unsolved even in the simplest case of depth-2 feedforward neural networks.
To advance research in this field, a team from Google Research and Northwestern University has introduced a set of polynomial time- and sample-efficient algorithms for learning an unknown depth-2 feedforward neural network with general rectified linear unit (ReLU) activations. Their recent paper, Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations, explores the plausibility of such efficient algorithms for learning ReLU networks.
The team considers the supervised learning problem with input drawn from a standard Gaussian distribution and labels generated by a neural network with element-wise ReLU activation functions. The tensor used in the work is formed by taking the weighted average of a score function evaluated on each data point. Notably, the team observes that a good approximation to a network can be achieved by analyzing multiple higher-order tensors.
Previous work in this area has assumed that the bias of learning networks with ReLU activations is zero. Departing from this traditional approach, the proposed algorithms comprise robustly decomposing, multiple higher-order tensors arising from the Hermite expansion of the function, with their performance confirming that polynomial time algorithms can be well-designed even in the presence of bias terms in the ReLU units.
The team uses a smoothed analysis framework — a beyond-worst-case-analysis paradigm — to explore and explain the practical success of various algorithms, and to establish beyond-worst-case guarantees and identifiability of the network parameters under minimal assumptions.
Overall, this theoretical study presents polynomial time algorithms for learning depth-2 neural networks with general ReLU activations with non-zero bias terms, provides provable guarantees under mild nondegeneracy conditions, and confirms the existence of time-efficient and sample-efficient learning algorithms for neural networks. The team suggests further research could be done to provide similar guarantees for networks of higher depth.
The paper Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.