AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind Unlocks Web-Scale Training for Open-World Detection

In a new paper Scaling Open-Vocabulary Object Detection, a DeepMind research team introduces OWLv2 model, an optimized architecture with improved training efficiency and applies and OWL-ST self-training recipe to the proposed OWLv2 to substantially improves detection performance, achieving state-of-the-art result on open-vocabulary detection task.

Open-vocabulary object detection plays a crucial role in numerous real-world computer vision tasks. However, due to the scarcity of detection training data and the fragility of pre-trained representations, the performance of trained models often falls short, revealing a lack of scaling potential.

Although the deficiency of detection data can potentially be addressed by utilizing Web image-text pairs as a form of weak supervision, such an approach has yet to be implemented on a large scale for image-level training.

In response to this issue, the DeepMind research team introduces the OWLv2 model in their latest paper, Scaling Open-Vocabulary Object Detection. This optimized architecture, not only enhances training efficiency but also applies the OWL-ST self-training recipe to the proposed OWLv2, substantially improving detection performance. As a result, it achieves a state-of-the-art result in the open-vocabulary detection task.”

The goal of this work is to optimize label space, annotation filtering, and training efficiency for the open-vocabulary detection self-training approach, to yield strong and scalable open-vocabulary performance with few labeled data.

The proposed simple self-training approach consists of three steps: 1) The team first uses an existing open-vocabulary detector to perform open box detection on WebLI, a large Web image-text dataset; 2) Then they use OWL-ViT CLIP-L/14 to annotate all WebLI images with bounding box pseudo annotations; 3) In the last step they fine-tune the trained model on human-annotated detection data, which further improves detection performance.

In particular, the researchers use a variant of the OWL-ViT architecture to train better detectors. In this architecture, they leverage contrastively trained image-text models to initiate image and text encoders while randomly initiating the detection heads.

In the training stage, they use the same losses and augment queries with “pseudo-negatives” of the OWL-ViT architecture and optimize training efficiency to maximize the number of the given seen images. They also adopt previously proposed practices for large-scale Transformer training to improve training efficiency. Together, the resulting OWLv2 model reduces training FLOPS by approximately 50% and speed up training throughput by 2× compared to the original OWL-ViT.

In their empirical study, the team compared their proposed approach to the previous state-of-the-art open vocabulary detectors, OWL-ST improves AP on LVIS rare classes from 31.2% to 44.6%, and combining the OWL-ST recipe with the OWLv2 architecture achieves new state-of-the-art performance.

Overall, the proposed OWL-ST recipe deliver significant improvements in detection performance using weak supervision from large scale wed data, which unlock the web-scale training for open-world localization.

The paper Scaling Open-Vocabulary Object Detection on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

7 comments on “DeepMind Unlocks Web-Scale Training for Open-World Detection

  1. Henry Larry

    An incredible leap forward in object detection! The advancements showcased by DeepMind’s OWLv2 model and its self-training approach are truly groundbreaking, setting a new benchmark in open-world detection performance.
    Top Lighting Showrooms in Mecklenburg County NC

  2. Discover the allure of pakistani jewellery online at Jewelgalore. Find a wide selection of intricately designed pieces that showcase the artistry and cultural richness of Pakistan, all conveniently available for you to explore and purchase.

  3. Osh University stands as a leading international medical university kyrgyzstan , dedicated to producing well-rounded healthcare professionals. Its global recognition and world-class faculty make it an ideal destination for aspiring medical students.

  4. Shalamar Hospital, your trusted Pakistan hospital, delivers high-quality healthcare services, prioritizing the health and recovery of every patient it serves.

  5. Shalamar Hospital is your reliable emergency hospital , equipped to handle critical medical situations. Trust us for immediate, expert care in times of need.

  6. Step into comfort with Tempo Garments' ankle socks. Crafted with soft, breathable fabrics, our collection offers the perfect blend of style and functionality. From classic neutrals to vibrant patterns, trust Tempo Garments to provide the ultimate comfort and support for your feet, ensuring you stay stylish and comfortable with every step.

  7. Indulge in luxury with premium fabrics from Master Fabrics, among the finest mens fabric brands in Pakistan. Elevate your wardrobe with superior quality and style

Leave a Reply

Your email address will not be published. Required fields are marked *