With Videotelephony the new normal, many are experimenting with virtual backgrounds. The real-time matting works well if a person is positioned in front of a green screen, but who has one of those? Moreover, do we really need them anymore?
No, say researchers from the City University of Hong Kong Department of Computer Science and SenseTime. In the new paper Is a Green Screen Really Necessary for Real-Time Human Matting?, the team proposes a lightweight matting objective decomposition network (MODNet) that can smoothly process real-time human matting from a single input image with diverse and dynamic backgrounds.
High-quality extraction of humans from natural images is a critical component for applications such as mixed reality, smart composition, photo editing and movie re-creation.
Green screens enable a high-quality alpha matte for easy real-time extraction of people in images or videos. When there is no green screen, matting methods mostly use a predefined trimap as the input to a natural matting algorithm. These trimaps roughly segment images into a definite (opaque) foreground, a definite (transparent) background, and unknown (opacity) regions in between. Using humans to annotate the trimaps is costly, while using depth cameras can result in low precision. Neither approach works in real-time.
The difficulty in obtaining trimaps has lead to research attempts such as the proposed MODNet that avoid their use altogether. The team says that working without trimaps however ramps up the high-quality image matting task complexity, “as semantic estimation will be necessary (to locate the foreground) before predicting a precise alpha matte.”
The researchers note that neural networks are better at learning a set of simple objectives than a single complex one, and so they decomposed the human matting task into three correlated sub-tasks in the proposed lightweight network MODNet. When given an input RGB image, MODNet predicts human semantics, boundary details, and a final alpha matte through the three interdependent architecture branches.
To reduce artifacts in the predicted alpha matte, the researchers designed a self-supervised sub-objectives consistency (SOC) strategy that applies consistency constraints between the predictions of the sub-objectives. When performing real-time human matting video tasks, the MODNet uses a one-frame delay (OFD) post-processing trick to smooth the predicted alpha mattes in the video sequence.


The researchers say existing trimap-free methods can tend to overfit to the training set and perform poorly on real-world data where the scenes are more complex. They addressed this issue by designing a more comprehensive benchmark, PHM-100 (Photographic Human Matting), comprising 100 finely annotated portrait images with various backgrounds. Mean Square Error (MSE) and Mean Absolute Difference (MAD) were used as the quantitative metrics. Although the proposed trimap-free MODNet was surpassed by trimap-based methods such as DIM, it greatly outperformed other trimap-free models in MSE and MAD. The researchers also note that when they changed MODNet to a trimap-based method, it outperformed trimap-based DIM.



Green screens were actually blue when their chroma-key effect was introduced in the 1940 British film The Thief of Bagdad. The movie wowed audiences and won Academy Awards for Cinematography, Art Direction and Special Effects. Eighty years has been a terrific run, but MODNet’s high-quality extraction of humans with fine details suggests the good-old green screen’s days may be numbered.
The paper Is a Green Screen Really Necessary for Real-Time Human Matting? is on arXiv. The code, pretrained model and validation benchmark will be made accessible on the project GitHub.
Reporter: Fangyu Cai | Editor: Michael Sarazen; Yuan Yuan

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.
Thinking of contributing to Synced Review? Synced’s new column Share My Research welcomes scholars to share their own research breakthroughs with global AI enthusiasts.

Pingback: [R] Do We Really Need Green Screens for High-Quality Real-Time Human Matting? – tensor.io
Pingback: [R] Do We Really Need Green Screens for High-Quality Real-Time Human Matting? – ONEO AI
Pingback: [R] Do We Really Need Green Screens for High-Quality Real-Time Human Matting? > Seekalgo
Pingback: AI視頻摳圖有多強?無需「綠幕」,也可達影視級效果 – N
Pingback: AI 影像去背有多強?無需「綠幕」,也可達專業級效果 - About 24/7