As artificial neural networks continue expanding in size, machine learning researchers are increasingly keen to find ways to compress them while incurring minimal performance trade-offs. Although standard pruning techniques can reduce large-scale networks’ parameter counts without sacrificing their predictive accuracy, this approach requires repeated rounds of computationally expensive retraining.
In 2019, MIT researchers Frankle & Carbin won the ICLR Best Paper Award with The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, which proposed that dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that, when trained in isolation, can reach test accuracy comparable to the original network in a similar number of iterations. How to most effectively find these winning lottery tickets however remains an open question.
A research team from Carnegie Mellon University, MBZUAI, Petuum, Inc and the University of Wisconsin-Madison tackles this problem in their new paper Rare Gems: Finding Lottery Tickets at Initialization, proposing GEM-MINER, an algorithm that finds lottery tickets at initialization that are trainable to accuracy comparable or better than iterative magnitude pruning (IMP) at speeds up to 19x faster.
GEM-MINER is designed to find rare gems: subnetworks with sparsity and non-trivial pretraining accuracy that can be finetuned to reach accuracy close to the original fully trained dense network. GEM-MINER uses a form of backpropagation, where each random weight is associated with a normalized score, and these normalized scores are used as optimization variables for computing the supermask, i.e. the pruning pattern of the network at initialization. In each iteration, GEM-MINER samples a set of training data and performs backpropagation on the loss of the effective weights to automatically find an optimal sparsity subnetwork.
The team evaluated GEM-MINER on CIFAR-10 image classification against baselines that included dense weight training and four pruning algorithms (IMP, Learning Rate Rewinding, Edge-Popup and Smart-Ratio).
In the experiments, the proposed GEM-MINER bettered all baselines, reaching high accuracy even in the early training stages. When finetuned, GEM-Miner outperformed IMP with warmup training at speeds up to 19x faster.
The researchers say their work resolves the open question of pruning at initialization, finding lottery tickets at initialization that have non-trivial accuracy even before finetuning and accuracy rivalling prune-after-train methods after finetuning.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.