AI Computer Vision & Graphics Machine Learning & Data Science

Facebook AI’s DETR Applies Transformers to CV Tasks

Facebook this week released Detection Transformers (DETR), a new approach for object detection and panoptic segmentation tasks that uses a completely different architecture than previous object detection systems.

Transformers are a deep learning architecture that has gained popularity in recent years, particularly on problems with sequential data such as natural language processing (NLP) tasks like language modelling and machine translation. Transformers have also been extended to tasks such as speech recognition, symbolic mathematics, and reinforcement learning.

To push the ‘Transformer revolution’ into the computer vision field, Facebook this week released Detection Transformers (DETR), a new approach for object detection and panoptic segmentation tasks that uses a completely different architecture than previous object detection systems.

“We present a new method that views object detection as a direct set prediction problem,” explains the Facebook research team. “Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task.”

DETR comprises a set-based global loss that forces unique predictions via bipartite matching and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, it can reason about the relations of the objects and the global image context to directly output the final set of predictions in parallel.

Unlike many other modern detectors, the new model is conceptually simple and does not require a specialized library. When tested on the COCO object detection data set, DETR matches the performance of previous SOTA methods such as the Faster R-CNN baseline.

It’s been over four years since Faster R-CNN was proposed as a SOTA approach in object detection, and new SOTA methods including last year’s new ResNeSt have achieved far better results. DETR’s novelty therefore lies primarily in achieving comparable results to an optimized Faster R-CNN with a simpler architecture.

dn-528.jpg

And although DETR achieves significantly better performance on large objects than Faster R-CNN, it still struggles with small objects, a shortcoming the researchers plan to address in future work.

DETR’s design is not only straightforward to implement, it can also be easily extended to panoptic segmentation with competitive results, say the researchers. The team hopes to help improve the interpretability of computer vision models by applying Transformers to object detection tasks.

The paper End-to-End Object Detection with Transformers is on arXiv.


Journalist: Yuan Yuan | Editor: Michael Sarazen

1 comment on “Facebook AI’s DETR Applies Transformers to CV Tasks

  1. Pingback: Facebook AI's DETR Applies Transformers to CV Tasks - Tech Box

Leave a Reply

Your email address will not be published.

%d bloggers like this: