AI Machine Learning & Data Science Research

Gem-Miner: Finding Lottery Tickets at Initialization and Bettering All Baselines at 19x Faster Speeds

In the new paper Rare Gems: Finding Lottery Tickets at Initialization, a research team from Carnegie Mellon University, MBZUAI, Petuum, Inc and the University of Wisconsin-Madison proposes GEM-MINER, an algorithm that finds sparse subnetworks at initialization trainable to accuracy that is comparable or better than iterative magnitude pruning (IMP) with warm-up.

As artificial neural networks continue expanding in size, machine learning researchers are increasingly keen to find ways to compress them while incurring minimal performance trade-offs. Although standard pruning techniques can reduce large-scale networks’ parameter counts without sacrificing their predictive accuracy, this approach requires repeated rounds of computationally expensive retraining.

In 2019, MIT researchers Frankle & Carbin won the ICLR Best Paper Award with The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, which proposed that dense, randomly-initialized, feed-forward networks contain subnetworks (winning tickets) that, when trained in isolation, can reach test accuracy comparable to the original network in a similar number of iterations. How to most effectively find these winning lottery tickets however remains an open question.

A research team from Carnegie Mellon University, MBZUAI, Petuum, Inc and the University of Wisconsin-Madison tackles this problem in their new paper Rare Gems: Finding Lottery Tickets at Initialization, proposing GEM-MINER, an algorithm that finds lottery tickets at initialization that are trainable to accuracy comparable or better than iterative magnitude pruning (IMP) at speeds up to 19x faster.

GEM-MINER is designed to find rare gems: subnetworks with sparsity and non-trivial pretraining accuracy that can be finetuned to reach accuracy close to the original fully trained dense network. GEM-MINER uses a form of backpropagation, where each random weight is associated with a normalized score, and these normalized scores are used as optimization variables for computing the supermask, i.e. the pruning pattern of the network at initialization. In each iteration, GEM-MINER samples a set of training data and performs backpropagation on the loss of the effective weights to automatically find an optimal sparsity subnetwork.

The team evaluated GEM-MINER on CIFAR-10 image classification against baselines that included dense weight training and four pruning algorithms (IMP, Learning Rate Rewinding, Edge-Popup and Smart-Ratio).

In the experiments, the proposed GEM-MINER bettered all baselines, reaching high accuracy even in the early training stages. When finetuned, GEM-Miner outperformed IMP with warmup training at speeds up to 19x faster.

The researchers say their work resolves the open question of pruning at initialization, finding lottery tickets at initialization that have non-trivial accuracy even before finetuning and accuracy rivalling prune-after-train methods after finetuning.

The code is available on the project’s GitHub. The paper Rare Gems: Finding Lottery Tickets at Initialization is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Gem-Miner: Finding Lottery Tickets at Initialization and Bettering All Baselines at 19x Faster Speeds

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: