Facebook announced today that it is open-sourcing QNNPACK, a high-performance kernel library optimized for mobile AI.
The computing power of mobile devices is but a tiny fraction of that of data center servers. As such it is essential to find ways to optimize mobile devices’ hardware performance in order to run today’s compute-hungry AI applications. The QNNPACK library can speed up many operations, such as depthwise convolutions, which have advanced neural network architectures use on mobile devices.
Facebook Product manager Joseph Spisak explained in a tweet: “If you were always wondering why camera effects, style transfer, etc. can run at real time on your phone, the answer is that, like all of deep learning, they are underpinned by linear algebra operations such as matrix multiplications…. QNNPACK uses lower precision (8bit) operations to make this possible.”
QNNPACK (Quantized Neural Networks PACKage) has been integrated into numerous Facebook apps and deployed on billions of devices. The library can perform advanced computer vision tasks such as running Mask R-CNN and DensePose on mobile phones in real time and performing image classification in less than 100ms on performance-limited mobile devices. On benchmarks such as MobileNetV2, QNNPACK outperforms the current best implementations by 2x.
Facebook open-sourced QNNPACK to provide comprehensive support for quantized inference as part of the PyTorch 1.0 platform. QNNPACK can be used immediately via Caffe2 model representation, and Facebook is developing a utility to export PyTorch’s Python front-end model to the graph representation. The company believes QNNPACK can go beyond optimizing mobile AI operations, and also has potential applications on other platforms.
Facebook recently released PyTorch 1.0 for research and production, enabling the flexibility of PyTorch to combine with Caffe2 to provide a fast and seamless path from research prototyping to a broad range of AI projects.
Facebook also announced it is open-sourcing its MobileNet V2 model, a state-of-the-art architecture for mobile visual tasks which achieves 1.3 percent higher top-1 accuracy than the corresponding TensorFlow model.
Author: Herin Zhao | Editor: Michael Sarzen