As AI systems increasingly move from the cloud to devices, identifying suitable neural network backbones for mobile device deployment has become a hot research area. While decreasing floating-point operations (FLOPs) and parameter counts have produced efficient mobile architectures with high accuracy, factors such as memory access and degree of parallelism continue to have a negative effect with regard to latency cost during inference.
In the new paper An Improved One Millisecond Mobile Backbone, an Apple research team presents MobileOne, a novel and efficient neural network backbone for mobile devices that cuts inference time to under one millisecond on an iPhone12 and reaches 75.9 percent top-1 accuracy on ImageNet.
The team summarizes their main contributions as:
- We introduce MobileOne, a novel architecture that runs within 1 ms on a mobile device and achieves state-of-the-art accuracy on image classification within efficient model architectures. The performance of our model also generalizes to desktop CPUs.
- We analyze performance bottlenecks in activations and branching that incur high latency costs on mobile in recent efficient networks.
- We analyze the effects of train-time re-parameterizable branches and dynamic relaxation of regularization in training. In combination, they help alleviate optimization bottlenecks encountered when training small models.
- We show that our model generalizes well to other tasks — object detection and semantic segmentation — while outperforming previous state-of-the-art efficient models.
The paper first introduces MobileOne’s architectural blocks, which are designed for convolutional layers factorized into depthwise and pointwise layers. The basic block is built on Google’s small MobileNet-V1 block of 3×3 depthwise convolution followed by 1×1 pointwise convolutions. Over-parameterization branches are also used to improve model performance.
MobileOne uses a depth scaling approach similar to MobileNet-V2 — with shallower early stages where input resolution is larger and the layers are slower. Because this setup does not require a multi-branched architecture at inference time, no data movement costs are incurred. This allows the researchers to aggressively scale model parameters compared to multi-branched architectures without introducing significant latency costs.
The team evaluated MobileOne on the ImageNet benchmark using mobile devices. In the tests, the MobileOne-S1 variant achieved a lightning-quick inference time of under one millisecond on an iPhone12 while scoring 75.9 percent top-1 accuracy. The researchers also demonstrated MobileOne’s versatility on other computer vision tasks, successfully applying it as a backbone feature extractor for a single shot object detector and in a Deeplab V3 segmentation network.
Overall, the study validates the proposed MobileOne as an efficient, general-purpose backbone that achieves state-of-the-art results compared to existing efficient architectures while being many times faster on mobile devices.
The paper An Improved One millisecond Mobile Backbone is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.