Designing Neural Architecture Search (NAS) is an important but compute-intensive process that requires huge datasets. To identify promising architectures from complex search spaces and minimize resource demands has become a major challenge for researchers and enterprises alike.
Chinese AI unicorn Megvii Technology has proposed a new single-path, one-shot NAS design approach which makes various applications more convenient and achieves start-of-the-art performance on the large dataset ImageNet.
Megvii’s innovation improves on one-shot NAS, an increasingly popular approach with advantages in both nested and joint NAS optimization. While one-shot approaches provide flexibility and effectiveness comparable to cutting-edge weight-sharing models such as ENAS, BSN, FBNet, they fail to achieve competitive performance because weights in the supernet — which subsumes all architectures and is trained only once — are deeply coupled.
Megvii researchers first reviewed general NAS approaches which aim to solve weight optimization (1) and architecture problems (2). Standard methods for optimizing the results however required an incredibly large amount of resources and complex supernet or reinforcement learning controllers.
After one-shot approaches were proposed, the supernet weight was optimized as:
and the architecture search was performed as:
While one-shot approaches efficiently achieve weight-sharing, researchers determined the recent “path dropout” weight-decoupling strategy raised difficulties with parameter fine-tuning. They decided to adopt an opposite approach to reduce weight coupling in the supernet, simplifying the search space to an extreme which only contains single path architectures.
Other contributions from the paper:
- The supernet training should be stochastic, all architectures should have their weights optimized simultaneously;
- Researchers used a hyperparameter-free method and uniform sampling to treat all architectures equally in the training process;
- The team applied an evolutionary algorithm to efficiently search the best-performing architectures without any fine-tuning.
Megvii Technology evaluated their one-shot NAS approach performance against state of the art weight-sharing approaches on various tasks, concluding it is “easiest to train, occupies the smallest memory, best satisfies the architecture (latency) constraint, and easily supports the large dataset.”
The paper Single Path One-Shot Neural Architecture Search with Uniform Sampling is on arXiv.
Author: Robert Tian | Editor: Michael Sarazen