VisualVoice Uses Facial Appearance to Boost SOTA in Speech Separation

Recent AI research on speech separation has explored ways to associate lip motions in videos with audio, but this approach suffers when speakers’ lips are occluded, which they often are in busy multi-speaker environments.