AI Machine Learning & Data Science Research

FastSAM Drastically Reduces Cost to Provide Real-Time Solution for Segment Anything Model

In a new paper Fast Segment Anything, a research team from Chinese Academy of Sciences, University of Chinese Academy of Sciences, Objecteye Inc. and Wuhan AI Research presents FastSAM, a real-time solution for the segment anything task that achieves comparable performance to SAM while drastically reducing computational demands.

The recent proposed segment anything model (SAM) has become a landmark foundation model in the field of vision, owing to its capability to segment any object within the given images. Despite its crucial role as a foundation step for various high-level vision tasks, it demands substantial computation resources. This requirement has become a bottleneck for real-world deployment, particularly for real-time applications.

In a new paper Fast Segment Anything, a research team from Chinese Academy of Sciences, University of Chinese Academy of Sciences, Objecteye Inc. and Wuhan AI Research presents FastSAM, a real-time solution for the segment anything task. This solution achieves performance comparable to SAM while significantly reducing computational demands.

The team summarizes their main contributions as follows:

  1. A novel, real-time CNN-based solution for the Segment Anything task is introduced, which significantly reduces computational demands while maintaining competitive performance.
  2. This work presents the first study of applying a CNN detector to the segment anything task, offering insights into the potential of lightweight CNN models in complex vision tasks.
  3. A comparative evaluation between the proposed method and SAM on multiple benchmarks provides insights into the strengths and weaknesses of the approach in the segment anything domain.

The FastSAM contains the All Instance Segmentation (AIS) and the Prompt-guided Selection (PGS) stages. In AIS stage, the team uses YOLOv8-seg to segment all objects in an image; In the PGS stage, they uses various provided prompts to separate the specific objects of interest.

In particular, the PGS stage involves the utilization of point prompts, box prompts, and text prompts. The goal for point prompts is to determine the mask in which the point is located; the box prompt aims at identifying the mask with the highest Intersection over Union score to select the object of interest; and in the text prompt the corresponding text embeddings are extracted by using CLIP model to select the mask with the highest similarity score to the image embeddings.

Together, the proposed FastSAM can select specific objects of interest from a segmented image robustly and efficiently, which enable it to accomplish the segment anything task in real-time.

In their empirical study, the team compared FastSAM with SAM on four zero-shot tasks, including zero-shot edge detection, object proposal generation, instance segmentation and object localization with text prompts. The results show that FastSAM speed up running time by 50 times than SAM-ViT-H and it demonstrates its capability to handle multiple downstream tasks well in real-time.

The paper Fast Segment Anything on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “FastSAM Drastically Reduces Cost to Provide Real-Time Solution for Segment Anything Model

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: