Recent AI research on speech separation has explored ways to associate lip motions in videos with audio, but this approach suffers when speakers’ lips are occluded, which they often are in busy multi-speaker environments.
VOGUE, an AI-powered optimization method that deforms garments according to a given body shape while preserving pattern and material details to deliver state-of-the-art photorealistic, high-resolution try-on images.
In the new paper Canonical Capsules: Unsupervised Capsules in Canonical Pose, Turing Award Honoree Dr. Geoffrey Hinton and a team of researchers propose an architecture for unsupervised learning with 3D point clouds based on capsules.
The approach dramatically reduces bandwidth requirements by sending only a keypoint representation [of faces] and reconstructing the source video on the receiver side with the help of generative adversarial networks (GANs) to synthesize the talking heads.
Researchers from the City University of Hong Kong and SenseTime propose a lightweight matting objective decomposition network (MODNet) that can smoothly process real-time human matting from a single input image with diverse and dynamic backgrounds.
“Our research provides enriched AR user experiences by enabling a more fine-grained visual recognition feature in AR, which is desirable in a wide range of application scenarios including technical support,” IBM researchers say.
Google AI has announced a new audiovisual speech enhancement feature in YouTube Stories (iOS) that enables creators to make better selfie videos by automatically enhancing their voices and reducing noise.