The current conventional wisdom on deep neural networks (DNNs) is that, in most cases, simply scaling up a model’s parameters and adopting computationally intensive architectures will result in large performance improvements. Although this scaling strategy has proven successful in research labs, real-world industrial deployments introduce a number of complications, as developers often need to repeatedly train a DNN, transmit it to different devices, and ensure it can perform under various hardware constraints with minimal accuracy loss.
The research community has thus become increasingly interested in reducing such models’ storage size on devices while also improving their run-time. Explorations in this area have tended to follow one of two avenues: reducing model size via compression techniques, or using model pruning to reduce computation burdens.
In the new paper LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification, a team from the University of Maryland and Google Research proposes a way to “bridge the gap” between the two approaches with LilNetX, an end-to-end trainable technique for neural networks that jointly optimizes model parameters for accuracy, model size on the disk, and computation on any given task.
The team summarizes their main contributions as : “
- We introduce LilNetX, an algorithm to jointly perform model compression as well as structured and unstructured sparsification for direct computational gains in network inference. Our algorithm can be trained end-to-end using a single joint optimization objective and does not require post-hoc training or post-processing.”
- With extensive ablation studies and results, we show the effectiveness of our approach while outperforming existing approaches in both model compression and pruning in most networks and dataset setups. “
The researchers consider the task of classification using a convolutional neural network (CNN), where the goal is to train a CNN that is jointly optimized to: 1) maximize classification accuracy, 2) minimize the number of bits required to store the model on disk, and 3) minimize the computational cost of inference in the model. The team notes that their approach can also be extended to other tasks such as object detection or generative modelling; and that, to the best of their knowledge, theirs is the first study to show it is possible to jointly optimize models in terms of compression and structured sparsity.
The team adopts the idea of reparameterized quantization (Oktay et al.) to perform model compression, penalizing the entropy of weights that are quantized in a reparameterized latent space. This approach is extremely useful in reducing the effective model size on the disk. They also introduce key design changes to reparameterized quantization to encourage structured and unstructured parameter sparsity in the model and enable trade-offs between model compression rates and accuracy, such that the full dense model is no longer needed during inference time. The team also employs priors to increase structured sparsity in the parameter space and reduce computation, and dubs their resulting model LilNetX (Lightweight Networks with EXtreme Compression and Structured Sparsification).
In their empirical study, the team applied their approaches to three network architectures (VGG-16, ResNet and MobileNet-V2) on three datasets (CIFAR-10, CIFAR-100 and ImageNet).
The results show that, compared to existing state-of-the-art model compression methods, LilNeX achieves an up to 50 percent smaller model size and 98 percent model sparsity on ResNet-20 while retaining the same accuracy on the CIFAR-10 dataset. The approach also yields a 35 percent smaller model size and 42 percent structured sparsity on ResNet-50 trained on ImageNet.
The proposed LilNeX’s state-of-the-art performance in simultaneous compression and FLOPs reduction directly translates to inference speedups and validates the method’s ability to jointly optimize DNNs in terms of compression (to reduce memory requirements) and structured sparsity (to reduce computation).
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.