AI Computer Vision & Graphics Machine Learning & Data Science Research

MIT & Harvard’s Open-Source FAn System Enables Real-Time Any Objects Detection, Tracking, and Following

In a new paper Follow Anything: Open-set detection, tracking, and following in real-time, a research team from MIT and Harvard University presents the follow anything system (FAn), an open-set real-time any object following framework that can detect, segment, track, and follow any object, and is able to adapt to new objects using text, images, or click queries.

Detecting and tracking objects are crucial for robotics use-cases, existing robotic systems for object following however suffer from two notable limitations: their capability to adapt to new objects is poor as they are closed set and only be able to handle a fixed set of object categories; they are not user-friendly as the target objects are often unintuitive for end-users to specify.

In a new paper Follow Anything: Open-set detection, tracking, and following in real-time, a research team from MIT and Harvard University presents the follow anything system (FAn), an open-set real-time any object following framework that can detect, segment, track, and follow any object, and is able to adapt to new objects using text, images, or click queries.

The team summarizes their key characters of the proposed FAn as follows:

  1. An open-set, multimodal approach to detect, segment, track, and follow any object in real-time.
  2. A unified system that is easily deployed on a robot platform (in our work, a micro aerial vehicle).
  3. Built with re-detection mechanisms that account for scenarios where the object of interest is occluded or tracking is lost.

The team define the open-vocabulary object following task as, given a robotic system equipped with an onboard camera as well as an object of interest, the goal is to detect the object of interest, and robot controls so that the object of interest is constrained within the field of view of the onboard camera.

To achieves this goal, FAn combines state-of-the-art ViT models, optimizes them to enable real-time processing, and unifies them into a single system. Specifically, the researchers leverage the segment anything model (SAM) for segmentation, DINO and CLIP to efficiently learn visual concepts from natural language and design a lightweight detection and semantic segmentation scheme. They also leverage the (Seg)AOT and SiamMask models for real-time tracking as well as introduce a lightweight visual servoing controller for object following.

Finally, the team conducted experiments for zero-shot detection, tracking, and following different objects. The results verify that FAn is able to seamlessly follow the objects of interest in real-time.

Overall, the proposed FAn provides an end-to-end solution for any objects following, it is open-set, is able to handle multimodal, can perform in real-time, and have adaptability to new environments. The team also has open-sourced this system to benefit a wide range of real-world applications.

The code is available on project’s GitHub. The paper Follow Anything: Open-set detection, tracking, and following in real-time on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

1 comment on “MIT & Harvard’s Open-Source FAn System Enables Real-Time Any Objects Detection, Tracking, and Following

  1. Pingback: MIT & Harvard’s Open-Source FAn System Enables Real-Time Any Objects Detection, Tracking, and Following – Ai Headlines

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: