Google Brain Open-Sources EfficientDet: SOTA Performance, 28x Fewer Flops

Google Brain announced this week that it is open-sourcing its object detector EfficientDet, which achieves SOTA performance while requiring significantly less compute. Researchers led by Quoc V. Le introduced the computer vision model in November 2019.

Screenshot 2020-03-19 17.15.46.png — *Computation differences between EfficientDet and SOTA models*

EfficientDet was developed based on one-stage design and uses the small and fast EfficientNet CNN — which Google introduced in May 2019 — as its baseline model. The two main challenges the research team identified were how to increase efficiency within multi-scale feature fusion, and how to achieve effective model scaling.

Screenshot 2020-03-19 16.54.31.png — *EfficientDet architecture*

Previous studies have explored various pathways for combining multi-scale features, such as top-down feature pyramid networks (FPN), bottom-up aggregation networks in PANet, and neural architecture search (NAS) in NAS-FPN. The Google Brain researchers proposed bidirectional cross-scale connections which lowered computing cost in three ways: (1) they removed nodes with only one input; (2) they added an extra edge from the original input to outputs on the same level to fuse more features; (3) they considered both the top-down and bottom-up paths as a single feature network layer, and repeated the same layer multiple times to enable more high-level feature fusion.

Screenshot 2020-03-19 15.22.03.png — *Comparison of four feature network designs*

In addition, researchers noticed that even though different input features have different resolutions and result in different contributions, previous feature fusion methods treated each input equally. Therefore, they chose fast normalized fusion as an additional weight for the network to learn such differences. The resulting overall optimization for feature fusion was a weighted bi-directional feature pyramid network (BiFPN).

Current approaches for scaling up baseline detectors tend to only engage in single dimensions such as bigger backbone networks or larger input images. Inspired by image classification methods that jointly scaled up all dimensions of network width, depth, and input resolution; and because grid search for all dimensions in object detection is prohibitive expensive; the researchers used a compound coefficient and developed separate equations to scale up the backbone network, BiFPN network, box/class prediction network, and input image resolution.

Screenshot 2020-03-19 16.53.26.png — *Scaling configs*

EfficientDet was tested on COCO2017 detection datasets along with other object detectors, and achieved SOTA accuracy with a much greater reduction in computation costs — consistently outperforming other models while using up to 28x fewer FLOPs and 8x fewer parameters.

Screenshot 2020-03-19 17.06.32.png — *EfficientDet and other object detectors performance on COCO*

The EfficientDetcode and instructions are available on Google’s GitHub, and the associated updated paper EfficientDet: Scalable and Efficient Object Detection is on arXiv.

Author: Reina Qi Wan | Editor: Michael Sarazen

Google Brain Open-Sources EfficientDet: SOTA Performance, 28x Fewer Flops

Like this:

0 comments on “Google Brain Open-Sources EfficientDet: SOTA Performance, 28x Fewer Flops”

Leave a Reply Cancel reply

Related

Share this:

Like this:

0 comments on “Google Brain Open-Sources EfficientDet: SOTA Performance, 28x Fewer Flops”

Leave a Reply Cancel reply

Related