A new study has proposed a novel real-time six degrees of freedom 3D face pose estimation technique that works without face detection or landmark localization. In the paper Img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation, the University of Notre Dame and Facebook AI research team details their easily trained R-CNN-based model.
Six degrees of freedom (6DoF) refers to the freedom of movement of a body in 3D space. While three degrees of freedom (3DoF) tracks pitch, yaw, and roll rotational motion only, 6DoF face pose estimation adds front/back, up/down, and left/right variables. The proposed method can directly estimate 6DoF 3D face pose for all faces, even in very crowded images, effectively skipping the face detection step.
Current face detection techniques first position a bounding box around each face in a photo. The next step is typically facial landmark detection, where the model localizes specific facial features such as eye centers and tip of the nose. This two-step process works well for many face-based reasoning tasks but suffers from high compute cost, especially in SOTA models. Additionally, as landmark detectors tend to be optimized for specific face detectors, they need to be re-optimized if the face detector is updated.
The Notre Dame and Facebook researchers do away with face alignment and landmark detection altogether. “We observe that estimating the 6DoF rigid transformation of a face is a simpler problem than facial landmark detection, often used for 3D face alignment. In addition, 6DoF offers more information than face bounding box labels,” they explain.
Given an image with multiple faces, the proposed method first estimates the 6DoF pose for each face, denoting the rotation and the 3D face translation. Because the 6DoF face pose can be converted to an extrinsic camera matrix for projecting a 3D face to the 2D image plane, the predicted 3D face poses can also be used for obtaining accurate 2D face bounding boxes. Face detection thus becomes a byproduct of the process, with minimal computational overhead.
By replacing training for face bounding box detection with 6DoF pose estimation, all 3D face shapes in an input image can then be aligned. Moreover, since the pose aligns a 3D shape with known geometry to a face region in the image, it is possible to adjust the generated face bounding boxes in terms of size and shape to match specific research needs.
The team built the img2pose model using a small and fast ResNet-18 backbone, and trained it on the WIDER FACE training set with a combination of weakly supervised labels and human-annotated ground-truth pose labels. They tested img2pose in real-time inference on two leading benchmarks, the AFLW2000-3D and BIWI datasets, where it outperformed SOTA face pose estimators while running in real-time, and surpassed models of comparable complexity on landmark detection despite not been optimized on bounding box labels.
The team believes their proposed direct multi-face approach is the first to estimate the 6DoF rigid transformation of 3D faces and align them with even the tiniest faces in an image without face detection or facial landmark localization. It’s suggested that the method might also be applied to improve accuracy in tasks such as object and key-point detection in the future.
The paper img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation is on arXiv, and the team plans to release the implementation on the project GitHub.
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.