The introduction of state-of-the-art, real-time object detection system YOLO (You only look once) in 2016 was a milestone in object detection research and led to better, faster and more accurate computer vision algorithms. Unfortunately, two months ago the father of YOLO Joseph Redmon announced he was leaving the field of computer vision due to concerns regarding the possible negative impact of his work. Redmon’s withdrawal triggered online debates and raised an important question: would there still be YOLO updates in the future?
Well, to the relief of many in the computer vision (CV) community, the answer is yes! The official YOLO Github account released an updated YOLO Version 4 last Friday.
The YOLO v4 release lists three authors: Alexey Bochkovskiy, the Russian developer who built the YOLO Windows version, Chien-Yao Wang, and Hong-Yuan Mark Liao.
Compared with the previous YOLOv3, YOLOv4 has the following advantages:
- It is an efficient and powerful object detection model that enables anyone with a 1080 Ti or 2080 Ti GPU to train a super fast and accurate object detector.
- The influence of state-of-the-art “Bag-of-Freebies” and “Bag-of-Specials” object detection methods during detector training has been verified.
- The modified state-of-the-art methods, including CBN (Cross-iteration batch normalization), PAN (Path aggregation network), etc., are now more efficient and suitable for single GPU training.
The authors used and combined the following new features to make their design suitable for efficient training and detection:
- Weighted-Residual-Connections (WRC)
- Cross-Stage-Partial-Connections (CSP), A new backbone that can enhance learning capability of CNN
- Cross mini-Batch Normalization (CmBN), represents a CBN modified version which assumes a batch contains four mini-batches
- Self-adversarial-training (SAT), represents a new data augmentation technique that operates in 2 forward backward stages
- Mish-activation, A novel self regularized non-monotonic neural activation function
- Mosaic data augmentation, represents a new data augmentation method that mixes 4 training images instead of a single image
- DropBlock regularization, a better regularization method for CNN
- CIoU loss, achieves better convergence speed and accuracy on the BBox regression problem
In experiments, YOLOv4 obtained an AP value of 43.5 percent (65.7 percent AP50) on the MS COCO dataset, and achieved a real-time speed of ∼65 FPS on the Tesla V100, beating the fastest and most accurate detectors in terms of both speed and accuracy. YOLOv4 is twice as fast as EfficientDet with comparable performance. In addition, compared with YOLOv3, the AP and FPS have increased by 10 percent and 12 percent, respectively.
YOLOv4’s excellent speed and accuracy and the well-written paper are a great contribution to engineering and academics. The update also illustrates an encouraging promotion and development of open source software: even if the father of YOLO has abandoned model updates, others can maintain and continue to promote the development of the powerful tools which we are increasingly reliant on.
The source code is on the Project Github. The paper YOLOv4: Optimal Speed and Accuracy of Object Detection is on arxiv.
Author: Hecate He | Editor: Michael Sarazen