It is no secret that deep neural networks (DNNs) can achieve state-of-the-art performance in a wide range of complicated tasks. DNN models such as BigGAN, BERT, and GPT 2.0 have proved the high potential of deep learning. Deploying DNNs on mobile devices, consumer devices, drones and vehicles however remains a bottleneck for researchers. For such practical, on-device scenarios, DNNs must have a smaller footprint.
The requirement for smaller DNNs has pushed researchers in two opposite directions: either hand-craft DNNs through design principles, or rely entirely on automated network architecture search. Now, researchers from the University of Waterloo Artificial Intelligence Institute and startup DarwinAI have designed and introduced efficient and compact DNNs which take “full advantage of human experience and creativity with the meticulousness and raw speed of a machine.”
The team created a family of deep convolutional neural networks called AttoNets to demonstrate how the new human-machine collaborative design approach can be used. In this design, each AttoNet
Researchers used two different common visual perception tasks in edge and mobile scenarios to evaluate the efficacy of the family of AttoNets:
- Object recognition on ImageNet50
Researchers observed that compared with the three state-of-the-art deep neural networks designed for edge and mobile applications – MobileNet- V1, MobileNet-V2, and ShuffleNet-V2, AttoNet – D was 10 times smaller than MobileNet – V1 and 4 times smaller than ShuffleNet – V2, with higher accuracy than both. “Which is quite a nice improvement given that both of these networks are state-of-the-art in the realm of edge and mobile deep learning,” first author and DarwinAI Co-Founder Alexander Wong, told Synced. Wong conducted the study with co-authors Zhong Qiu Lin and Brendan Chwyl. AttoNet – B was 8.5 percent more accurate than MobileNet – V1 while remaining smaller and faster.
- Instance segmentation and object detection on Pascal VOC 2012 segmentation dataset
Unlike than the task of object recognition, instance segmentation and object detection focus on segmenting objects in a specific scene, detecting bounding boxed around objects, and categorizing each segment. For this task, the team introduced Atto-MaskRCNN, which required five times fewer multiply-add operations than a ResNet50 based MaskRCNN.
Two phases of the human-machine collaborative design strategy
The first phase focuses on the principled network design prototyping, the next stage on machine-driven design exploration.
The first stage commences with the researchers’ intention to compose a human-specified design prototype for edge and mobile scenarios towards visual perception. Researchers chose to leverage a number of human-driven design principles rather than being efficiency-oriented. The team argued that compared with humans, “machines are considerably more capable at low-level design exploration.” Thus, at this stage, it is better to take advantage of human efforts to build a high-level network infrastructure for higher modeling accuracy.
The emphasis on producing high-performance deep convolutional neural networks for performing visual perception led the researchers to adopt two human-driven design principles:
- greatly increase the depth of the underlying deep network architecture
- incorporate shortcut connections within the network architecture.
As the first principle enables the learning of deeper feature embeddings, the second principle results in a residual architecture that “makes it easier for iterative optimization methods.”
The next stage is to use machine-driven design exploration to explore module-level macro-architecture and micro-architecture designs. The process is guided by the “invisible human hand” — or human specified initial design prototypes. With the human-specified design requirements and constraints weighed in, the deep neural networks then can be “well-suited for on-device object recognition for edge and mobile scenarios.”
As a result of module-level macro-architecture and micro-architecture designs, the team introduced a family of AttoNets that possess a unique 69-layer deep convolutional neural network architecture. From AttoNet – A to AttoNet – D, researchers observed diversity that can only be seen in fine-grained machine-driven design exploration. For example, from AttoNet – A to AttoNet – D, the overall architectural complexity decreases in stages, and is “consistent with the progressive strategy taken in the machine-driven design exploration phase.”
Researchers suggest the human-machine collaborative design strategy can be used for a broader range of tasks including video action recognition, video pose estimation, image captioning, image super-resolution, and image generation.
The paper AttoNets: Compact and Efficient Deep Neural Networks for the Edge via Human-Machine Collaborative Design is on arXiv.
Journalist: Fangyu Cai | Editor: Michael Sarazen