While it is generally accepted that network depth is responsible for the high performance of today’s deep learning (DL) models, adding depth also brings downsides such as increased latency and computational burdens, which can bottleneck progress in DL. Is it possible to achieve similarly high performance without deep networks?
In the new paper Non-deep Networks, a research team from Princeton University and Intel Labs argues that it is, proposing ParNet (Parallel Networks), a novel non-deep architecture that achieves performance competitive with its state-of-the-art deep counterparts.
The team summarizes their study’s contributions as:
- We show, for the first time, that a neural network with a depth of only 12 can achieve high performance on very competitive benchmarks (80.7% on ImageNet, 96% on CIFAR10, 81% on CIFAR100).
- We show how parallel structures in ParNet can be utilized for fast, low-latency inference.
- We study the scaling rules for ParNet and demonstrate effective scaling with constant low depth.
The main design feature of ParNet is its use of parallel subnetworks or substructures (referred to as “streams” in the paper) that process features at different resolutions. The features from different streams are fused at a later stage in the network used for downstream tasks. This approach enables ParNet to function effectively with a network depth of only 12 layers, orders of magnitude lower than ResNet models, for example, which in extreme cases can include up to 1000 layers.
A key ParNet component is its RepVGG-SSE, a modified Rep-VGG block with a purpose-built Skip-Squeeze-Excitation module. ParNet also contains a downsampling block that reduces resolution and increases width to enable multi-scale processing, and a fusion block that combines information from multiple resolutions.
In their empirical study, the team compared the proposed ParNet with state-of-the-art deep neural networks baselines such as ResNet110 and DenseNet on large-scale visual recognition benchmarks that included ImageNet, CIFAR and MS-COCO.
The results show that a ParNet with a depth of just 12 layers was able to achieve top-1 accuracies of over 80 percent on ImageNet, 96 percent on CIFAR10, and 81 percent on CIFAR100. The team also demonstrated a detection network with a 12 layer backbone that achieved an average precision of 48 percent on the MS-COCO large-scale object detection, segmentation and captioning dataset.
Overall, the study provides the first empirical proof that non-deep networks can perform competitively with their deep counterparts on large-scale visual recognition benchmarks. The team hopes their work can contribute to the development of neural networks that are a better fit for future multi-chip processors.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.