Eye contact one of the most widely used forms of nonverbal communication, capable of relaying signals that can have a strong influence on contextual understanding and social behaviour in general. Although 2D video conferencing has become a convenient and effective way to enable face-to-face interactions for example for remote workers, the flatness of the medium does not enable the full 3D immersion found in the real world. It may seem we cannot have both 2D convenience and 3D immersion, as current telepresence systems do not accurately convey eye contact or related social gaze signals.
In a bid to push common video conferencing toward a more real-world experience, researchers from Facebook Reality Labs have developed a virtual telepresence system that uses photorealistic avatars to capture and convey the gaze and eye contact signals of real-world social interactions.
Tracking human faces and eyes in VR is challenging due to headset occlusion and other factors that can make it difficult to find correspondences with images captured from the headset. Traditional methods can generate avatar renderings with high realism, but depend heavily on the accurate estimates of geometry and reflectance properties of the eyes. Recently developed Deep Appearance Models — basically a data-driven rendering pipeline approach that learns joint representations of facial geometry and appearance from a multiview capture setup — can reduce the need for highly accurate geometry. Such models however generalize poorly to viewpoints, gaze directions and gaze-expression combinations not seen during training.
In the new work, the researchers improve on current deep appearance model methods by employing a novel joint eye and face appearance model which not only accurately reproduces conditions seen during training, but can do the same for unseen or rare conditions.
The proposed model first uses gaze-conditioning in deep appearance models to allow independent control of gaze and expression. Then they employ an explicit eyeball model (EEM) to jointly learn a model of facial and periocular appearance and better capture eye geometry and motion. Finally, the system optimizes its output with inverse rendering to match captured images.
The results show that the proposed method generalizes well across various degrees of eye openness, different identities, and combinations of eye and lower facial expressions.
The paper The Eyes Have It: An Integrated Eye and Face Model for Photorealistic Facial Animation is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.