Google Brain announced this week that it is open-sourcing its object detector EfficientDet, which achieves SOTA performance while requiring significantly less compute. Researchers led by Quoc V. Le introduced the computer vision model in November 2019.
EfficientDet was developed based on one-stage design and uses the small and fast EfficientNet CNN — which Google introduced in May 2019 — as its baseline model. The two main challenges the research team identified were how to increase efficiency within multi-scale feature fusion, and how to achieve effective model scaling.
Previous studies have explored various pathways for combining multi-scale features, such as top-down feature pyramid networks (FPN), bottom-up aggregation networks in PANet, and neural architecture search (NAS) in NAS-FPN. The Google Brain researchers proposed bidirectional cross-scale connections which lowered computing cost in three ways: (1) they removed nodes with only one input; (2) they added an extra edge from the original input to outputs on the same level to fuse more features; (3) they considered both the top-down and bottom-up paths as a single feature network layer, and repeated the same layer multiple times to enable more high-level feature fusion.
In addition, researchers noticed that even though different input features have different resolutions and result in different contributions, previous feature fusion methods treated each input equally. Therefore, they chose fast normalized fusion as an additional weight for the network to learn such differences. The resulting overall optimization for feature fusion was a weighted bi-directional feature pyramid network (BiFPN).
Current approaches for scaling up baseline detectors tend to only engage in single dimensions such as bigger backbone networks or larger input images. Inspired by image classification methods that jointly scaled up all dimensions of network width, depth, and input resolution; and because grid search for all dimensions in object detection is prohibitive expensive; the researchers used a compound coefficient and developed separate equations to scale up the backbone network, BiFPN network, box/class prediction network, and input image resolution.
EfficientDet was tested on COCO2017 detection datasets along with other object detectors, and achieved SOTA accuracy with a much greater reduction in computation costs — consistently outperforming other models while using up to 28x fewer FLOPs and 8x fewer parameters.
Author: Reina Qi Wan | Editor: Michael Sarazen