The remarkable successes of contemporary large neural networks in generalizing to new data and tasks have been attributed to their ability to implicitly memorize complex training patterns. Boosting model size has proven an effective approach for enabling such memorization, but this can also dramatically increase training and serving costs. Might there be a way to improve memorization and generalization while maintaining a relatively small model size?
A Stanford University research team tackles this question in the new paper ResMem: Learn What You Can and Memorize the Rest, proposing ResMem, a residual-memorization algorithm that improves the generalization ability of smaller neural network models by performing explicit memorization via a separate k-nearest neighbour component.
The team summarizes their study’s main contributions as follows:
- We propose residual-memorization (ResMem), a two-stage learning algorithm that combines a base prediction model with a nearest neighbour regressor.
- We empirically demonstrate that ResMem improves test performance of neural networks, particularly when the training set is extremely large.
- We theoretically analyze the rate of convergence of ResMem on a stylized linear regression problem, and show that it can improve upon the base prediction model.
Prior studies have suggested that memorization is sufficient and can even be necessary for effective generalization in neural network models. Motivated by this research direction, the ResMem algorithm is designed to improve the generalization performance of small models via a novel explicit memorization approach.
The team first trains a standard neural network and fits the model’s residuals with a soft k-nearest neighbour regressor (rkNN). The final output is then calculated as the sum of the original model’s and the rkNN’s predictions.
The team’s empirical study compared ResMem with a DeepNet baseline on vision (image classification on CIFAR100 and ImageNet) and natural language processing (autoregressive language modelling) tasks. ResMem’s test set generalization abilities surpassed all benchmarks in the evaluations. The researchers also note that as the number of samples approaches infinity, ResMem returns a more favourable test risk than the base predictor.
The paper ResMem: Learn What You Can and Memorize the Rest is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.