Although mobile devices were not designed to run compute-heavy AI models, in recent years AI-powered features like face detection, eye tracking, and voice recognition have all been added to smartphones. Much of the compute for such services is done on the cloud, but ideally these applications would be light enough to run directly on devices without an Internet connection.
In this spirit of “smaller is better,” Shanghai-based developer “Linzai” (GitHub user name @Linzaer) recently shared a new lightweight model that enables real-time face detection for smartphones. The “Ultra-Light-Fast-Generic-Face-Detector-1MB” is designed for general-purpose face detection applications in low-power computing devices and is applicable to both Android and iOS phones as well as PCs (CPU and GPU). The project has garnered a whopping 3.3k Stars and over 600 forks on GitHub.
Facial recognition technology is widely applied in security monitoring, surveillance, human-computer interaction, entertainment, etc. Detecting human faces in digital images is the first step in facial recognition, and an ideal face detection model can be evaluated by how quickly and accurately it performs.
The Face-Detector-1MB stands out in terms of speed — the model’s default FP32 precision (.pth) file size is 1.1MB, and the inference frame int8 is quantized to a size of 300KB. In terms of model calculation, the input resolution of 320×240 is only about 90 to 109 MFlops.
The Face-Detector-1MB training process used a VOC dataset generated by the WIDER FACE dataset, a face detection benchmark. WIDER FACE was released in 2015 and consists of 32,203 images and 393,703 face bounding boxes with a high degree of variability in scale, pose, expression, occlusion and illumination.
The 1MB lightweight model comes in a version-slim with slightly faster simplification, and a version-RFB with a modified RFB module for higher precision. The model was tested on Ubuntu16.04, Windows 10, Python3.6, Pytorch1.2, CUDA10.0, etc.
Researchers compared the accuracy and speed of both version-slim and version-RFB to other open-source lightweight face detection models such as Retinaface-Mobilenet-0.25 (Mxnet). Retinaface is a robust single-stage face detector that performs pixel-wise face detection on faces using joint extra-supervised and self-supervised multi-task learning; while MobileNets is a class of convolutional neural network designed by Google researchers. With its “mobile-first” architecture, Retinaface-Mobilenet-0.25 is resource-friendly and can effectively run face detection on phones.
Results posted on Linzai’s GitHub show that the version-RFB outperformed the Retinaface-Mobilenet-0.25 (Mxnet) on easy, medium, and hard sets of the single-input images at 320*240 resolution, and on the medium and hard sets of VGA images at 640*480 resolution.
There are however challenges — the best results for each scene required humans making manual adjustments of the input resolution to strike a balance between speed and accuracy. Excessive input resolution enhances the recall rate of small faces, but may also increase the false positive rate of large faces and reduce reasoning speed. An extremely small input resolution meanwhile will speed up reasoning, but also greatly reduce the recall rate of small faces.
Linzai is calling for more testing and optimization of the Face-Detector-1MB model on GitHub. Additional information on the model and testing datasets are available on GitHub.
Journalist: Yuan Yuan | Editor: Michael Sarazen