Differentiating a target foreground subject from its background is a fundamental computer vision task with widespread applications in image editing and composition. Basic segmentation approaches that use a binary pixel classification scheme do not consider the varying opacity in foreground/background edge pixels, resulting in hard and unnaturally contrastive edges around the foreground subject. Although recent deep learning-based natural image matting techniques have been shown to significantly improve fine-grained detail in these areas by estimating per-pixel opacity of the target foreground, these techniques rely on user-supplied trimaps as an auxiliary input, which limits their real-world applicability.
In the new paper PP-Matting: High-Accuracy Natural Image Matting, a Baidu research team proposes PP-Matting, a trimap-free architecture that combines a high-resolution detail branch and a semantic context branch to achieve state-of-the-art performance on natural image matting tasks.

The researchers summarize their main contributions as:
- We propose PP-Matting, a high-accuracy matting network, which takes a single image as input without any auxiliary information. The whole network can be trained easily in an end-to-end way.
- We propose a two-branch architecture that extracts detail and semantic features efficiently in parallel. With a guidance flow mechanism, the proper interaction of the two branches helps the network achieve better semantic-aware detail prediction.
- We evaluate PP-Matting on Composition-1k and Distinctions-646 datasets. The results demonstrate the superiority of PP-Matting over other methods. Another experiment on human matting also shows its outstanding performance in practical application.
In an input image comprising a target foreground subject and a background, the colour of each pixel is formulated as a linear combination equation of foreground and background colours, while an alpha matte defines the pixels’ relative opacity. Conventional image matting also requires a user-supplied trimap — a rough segmentation that divides the image into foreground, background, and transition regions. Once the user has specified the foreground/background semantic context, the model can focus on the transition region to more accurately predict the matte. The team notes however that many users struggle with creating a trimap, and that trimaps are simply unfeasible in cases such as live video.

A key challenge in building a trimap-free approach is extracting accurate semantic context from an image. The proposed PP-Matting architecture comprises two branches: a high-resolution detail branch (HRDB) that captures fine-grained details such as human hair; and a semantic context branch (SCB). Because the lack of a trimap leaves the model with a lack of semantic context, detail prediction can suffer from foreground-background ambiguity issues. The SCB is designed to address this and ensure the semantic correctness of details. The PP-Matting’s final alpha matte output is then produced by fusing the HRDB details prediction with the SCB’s semantic map via a guidance flow mechanism.

The team compared the proposed PP-Matting to previous top-performance methods (ClosedForm, KNN Matting, etc.) on the Composition-1k and Distinctions-646 datasets. In the evaluations, PP-Matting outperformed the other methods and achieved performance competitive with DIM, a trimap-based deep learning method. Overall, the results show that the novel interaction of PP-Matting’s two branches enables it to achieve state-of-the-art performance in alpha matte prediction.
The code and pretrained models will be available at PaddleSeg. The paper PP-Matting: High-Accuracy Natural Image Matting is on arXiv.
Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
It’s so awesome. I think it will help me a lot.