Speaking at the London Mathematical Society in 1947, Alan Turing seemed to anticipate the current state of machine learning research: “What we want is a machine that can learn from experience . . . like a pupil who had learnt much from his master, but had added much more by his own work.”
Although neural networks (NNs) have demonstrated impressive learning power in recent years, they still fail to outperform human-designed learning algorithms. A question naturally arises: Can NNs be made to discover efficient learning algorithms on their own?
In the new paper Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms, a research team from Microsoft and Harvard University demonstrates that NNs can discover succinct learning algorithms on their own in polynomial time and presents an architecture that combines recurrent weight-sharing between layers and convolutional weight-sharing to reduce models’ parameter size from even trillions of nodes down to a constant.
The team’s proposed neural network architecture comprises a dense first layer of size linear in m (the number of samples) and d (the dimension of the input). This layer’s output is fed into an RCNN (with recurrent weight-sharing across depth and convolutional weight-sharing across width), and the RCNN’s final outputs are then passed through a pixel-wise NN and summed to produce a scalar prediction.
The team’s key contribution is the design of this simple recurrent convolutional (RCNN) architecture, which combines recurrent weight-sharing across layers and convolutional weight-sharing within each layer and reduces the number of weights in the convolutional filter to a few — even a constant — while maintaining the weight functions to determine activations for a very wide and deep network.
Overall, the study demonstrates that a simple NN architecture can effectively achieve Turing-optimality — wherein it learns as well as any bounded learning algorithm. The researchers believe reducing the size of the dense parameters to depend on the algorithm’s memory usage instead of the training sample size and using stochastic gradient descent (SGD) beyond memorization could make the architecture even more concise and natural. In future work, they plan to explore other combinations of architectures, initializations, and learning rates to improve understanding of which are Turing-optimal.
The paper Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.