Differentiating a target foreground subject from its background is a fundamental computer vision task with widespread applications in image editing and composition. Basic segmentation approaches that use a binary pixel classification scheme do not consider the varying opacity in foreground/background edge pixels, resulting in hard and unnaturally contrastive edges around the foreground subject. Although recent deep learning-based natural image matting techniques have been shown to significantly improve fine-grained detail in these areas by estimating per-pixel opacity of the target foreground, these techniques rely on user-supplied trimaps as an auxiliary input, which limits their real-world applicability.
In the new paper PP-Matting: High-Accuracy Natural Image Matting, a Baidu research team proposes PP-Matting, a trimap-free architecture that combines a high-resolution detail branch and a semantic context branch to achieve state-of-the-art performance on natural image matting tasks.
The researchers summarize their main contributions as:
- We propose PP-Matting, a high-accuracy matting network, which takes a single image as input without any auxiliary information. The whole network can be trained easily in an end-to-end way.
- We propose a two-branch architecture that extracts detail and semantic features efficiently in parallel. With a guidance flow mechanism, the proper interaction of the two branches helps the network achieve better semantic-aware detail prediction.
- We evaluate PP-Matting on Composition-1k and Distinctions-646 datasets. The results demonstrate the superiority of PP-Matting over other methods. Another experiment on human matting also shows its outstanding performance in practical application.
In an input image comprising a target foreground subject and a background, the colour of each pixel is formulated as a linear combination equation of foreground and background colours, while an alpha matte defines the pixels’ relative opacity. Conventional image matting also requires a user-supplied trimap — a rough segmentation that divides the image into foreground, background, and transition regions. Once the user has specified the foreground/background semantic context, the model can focus on the transition region to more accurately predict the matte. The team notes however that many users struggle with creating a trimap, and that trimaps are simply unfeasible in cases such as live video.
A key challenge in building a trimap-free approach is extracting accurate semantic context from an image. The proposed PP-Matting architecture comprises two branches: a high-resolution detail branch (HRDB) that captures fine-grained details such as human hair; and a semantic context branch (SCB). Because the lack of a trimap leaves the model with a lack of semantic context, detail prediction can suffer from foreground-background ambiguity issues. The SCB is designed to address this and ensure the semantic correctness of details. The PP-Matting’s final alpha matte output is then produced by fusing the HRDB details prediction with the SCB’s semantic map via a guidance flow mechanism.
The team compared the proposed PP-Matting to previous top-performance methods (ClosedForm, KNN Matting, etc.) on the Composition-1k and Distinctions-646 datasets. In the evaluations, PP-Matting outperformed the other methods and achieved performance competitive with DIM, a trimap-based deep learning method. Overall, the results show that the novel interaction of PP-Matting’s two branches enables it to achieve state-of-the-art performance in alpha matte prediction.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.