AI is now integrated into countless scenarios, from tiny drones to huge cloud platforms. Every hardware platform is ideally paired with a tailored AI model that perfectly meets requirements in terms of performance, efficiency, size, latency, etc. However even a single model architecture type needs tweaking when applied to different hardware, and this requires researchers spend time and money training them independently.
Popular solutions today include either designing models specialized for mobile devices or pruning a large network by reducing redundant units, aka model compression. A group of MIT researchers (Han Cai, Chuang Gan and Song Han) have introduced a “Once for All” (OFA) network that achieves the same or better level accuracy as state-of-the-art AutoML methods on ImageNet, with a significant speedup in training time.
A major innovation of the OFA network is that researchers don’t need to design and train a model for each scenario, rather they can directly search for an optimal subnetwork using the OFA network.
Researchers first defined the objective as obtaining the weights of a network so each subnetwork can still achieve the same level of accuracy as a network trained independently with the same architectural configuration (depth, width, kernel size, and resolution). The OFA network supports a much larger search space (10^(19) subnetworks) than previous AutoML methods.
To efficiently train such a gigantic OFA network, researchers propose a progressive shrinking algorithmthat enables them to train a full neural network with the maximum architecture under elastic resolution, and fine-tune the neural network to support hardware ranging from large subnetworks to small subnetworks.
Unlike most AutoML methods that employ search algorithms to find subnetworks, researchers randomly sampled a subset of subnetworks from OFA networks to build their accuracy and latency tables. This enabled them to directly query the table given a specific hardware platform to find a corresponding subnetwork. The cost of querying tables is negligible, thereby avoiding the linear growth of the total cost in other methods.
OFA networks trained on ImageNet scored Top-1 accuracy comparable to independent models, though OFA one-time training cost was roughly 12 times higher than that of the independent models. Researchers suggest this high one-time cost could be reduced through additional deployment scenarios. They also proved the importance of using a progressive shrinking algorithm, as OFA subnetwork accuracy dropped two percent when not using it.
More importantly, in comparison with other state-of-the-art neural architecture search methods on a Samsung Note 8, the OFA network training time was 14 times faster than ProxylessNAS, 16 times faster than FBNet, and 1,142 times faster than MnasNet when the number of deployment scenarios was 40. The OFA networks also achieved slightly better accuracy under similar latency.
“One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them in the Land of Mordor where the Shadows lie.” The famous inscription from the movie The Lord of the Rings hints at the incomparable power of the ring to win dominion in that fictional world. Might the “Once For All” network similarly prevail over all others, taking a huge leap in transforming how machine learning models are deployed across different hardware?
The paper Once for All: Train One Network and Specialize it for Efficient Deployment on arXiv.
Journalist: Tony Peng | Editor: Michael Sarazen