The high computation and storage demands of large scale DNNs such as ResNet make it difficult to conduct broad and real-time applications with high accuracy at the mobile end. A common approach to this problem is model compression through DNN weight pruning. Researchers from Northeastern University, DiDi AI Labs, Syracuse University and the University of Michigan recently proposed an automatic structured pruning framework, AutoCompress, which adopts the 2018 ADMM-based weight pruning algorithm and outperforms previous automatic model compression methods while maintaining high accuracy.
This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem. (arXiv)
Synced invited Prof. Caiwen Ding, an assistant professor in the Department of Computer Science & Engineering at the University of Connecticut, to share his thoughts on the paper AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates.
How would you describe AutoCompress?
As the model size of deep neural networks (DNNs) grows, it brings challenges on memory storage and computational speed. Various model compression techniques have therefore been proposed to reduce DNN model size and accelerate DNNs. However, this requires significant capital cost of human resources. As an automatic structured pruning framework, AutoCompress achieves ultra-high pruning rates while outperforming prior deep reinforcement learning techniques. Significant inference speedup makes AutoCompress favourable for many smartphone-related machine learning applications.
Why does this research matter?
The proposed AutoCompress can effectively reduce both FLOPs and the storage of DNNs. Moreover, the automatic determination process is able to effortlessly search the redundant weights without any human experience. Experimental results demonstrate that AutoCompress achieves SOTA results on popular backbone networks. AutoCompress outperforms the prior work on automatic model compression by up to 33× in pruning rate under the same accuracy.
What impact might this work bring to the field?
With the emerging development of edge AI, the demands for high performance of deep models are increasing. The high computational and storage requirements of large-scale DNNs make it hard to achieve real-time performance of DNN running on edge devices. An automatic hyperparameter determination process of pruning a neural network is necessary due to the large number of flexible hyperparameters. Motivated by the concept of AutoML, this work proposed an automatic structured pruning framework AutoCompress which achieves SOTA compression rate for backbone networks. Another interesting find of this work is that heuristic search outperforms the DRL-based pruning approach. The great potential of this work can be extended to other applications such as object detection, natural language processing and other tasks.
Can you identify any bottlenecks in the research?
The proposed AutoCompress performs promising compression rate on backbone networks, other tasks such as object detection could be evaluated to illustrate the generality of the proposed approach.
Can you predict any potential future developments related to this research?
The future work would be applying different pruning schemes on the automatic pruning process. Moreover, different heuristic search approaches could be tested such as Genetic & Evolutionary Algorithms to illustrate whether the improvement from the searching still holds.
The paper AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression is on arXiv.
Caiwen Ding is an assistant professor in the Department of Computer Science & Engineering at the University of Connecticut. He received his PhD degree from Northeastern University (NEU) in 2019, supervised by Dr. Yanzhi Wang. His research interests include Machine Learning & Deep Neural Network Systems; Computer Architecture; Non-von Neumann computing and neuromorphic computing; Efficient computing for cyber-physical systems and embedded systems. His work has been published in high-impact conferences (e.g., AAAI, ASPLOS, ISCA, MICRO, HPCA, FPGA, DAC, DATE).
Synced Insight Partner Program
The Synced Insight Partner Program is an invitation-only program that brings together influential organizations, companies, academic experts and industry leaders to share professional experiences and insights through interviews and public speaking engagements, etc. Synced invites all industry experts, professionals, analysts, and others working in AI technologies and machine learning to participate.
Simply Apply for the Synced Insight Partner Program and let us know about yourself and your focus in AI. We will give you a response once your application is approved.