In a new paper Follow Anything: Open-set detection, tracking, and following in real-time, a research team from MIT and Harvard University presents the follow anything system (FAn), an open-set real-time any object following framework that can detect, segment, track, and follow any object, and is able to adapt to new objects using text, images, or click queries.
In a new paper Scaling Open-Vocabulary Object Detection, a DeepMind research team introduces OWLv2 model, an optimized architecture with improved training efficiency and applies and OWL-ST self-training recipe to the proposed OWLv2 to substantially improves detection performance, achieving state-of-the-art result on open-vocabulary detection task.
In the new paper DETRs Beat YOLOs on Real-Time Object Detection, a Baidu Inc. research team presents Real-Time Detection Transformer (RT-DETR), a real-time end-to-end object detector that leverages a hybrid encoder and novel IoU-aware query selection to address inference speed delay issues. RT-DETR outperforms YOLO object detectors in both accuracy and speed.
In the new paper YOLOv7: Trainable Bag-Of-Freebies Sets New State-Of-The-Art for Real-Time Object Detectors, an Academia Sinica research team releases YOLOv7. This latest YOLO version introduces novel “extend” and “compound scaling” methods that effectively utilize parameters and computation; and surpasses all known real-time object detectors in speed and accuracy.
Current state-of-the-art convolutional architectures for object detection tasks are human-designed. In a recent paper, Google Brain researchers leveraged the advantages of Neural Architecture Search (NAS) to propose NAS-FPN, a new automatic search method for feature pyramid architecture.