In the filming of movies like Avatar or Pirates of the Caribbean, actors donned motion capture suits, allowing special cameras to record their physical actions, which were then processed and played back in the form of an animated 2D or 3D digital character. Now, a Chinese AI startup called SenseTime is creating similar results using just a smartphone.
The technology, called SensePose, can synchronize users’ movements with virtual figures in real-time videos. In contrast to traditional approaches that use infrared or structured light along with the aforementioned hardware, SensePose needs only ambient light and a regular RGB camera lens.
The exciting motion synchronization technology’s debut drew crowds at NVIDIA GPU Technology Conference (GTC) in San Jose in May, with visitors posing and gesturing in front of a big screen topped with a camera lens — their actions synchronized in real time with a virtual Chinese traditional puppet.
Founded in 2014, SenseTime uses deep learning in the development of computer vision to replicate tasks performed by the human visual system. The Chinese AI startup has just completed a US$410 million Series B round, and it is said to be valued at over US$1.47 billion according to their latest announcement.
SenseTime provides a range of software development kits and application programming interfaces in facial recognition and smart surveillance. Along with SensePose, the company also showcased two of its 2016 technologies at the GTC: a visual scenario analytics system called SenseVideo, and a facial recognition system called SenseFace.
SenseVideo can recognize the positions and attributes of humans, vehicles and other entities from video input, and is expected to be a powerful tool in building smart transportation, security and surveillance systems. Meanwhile, SenseFace is a millisecond-level facial detection system for mobile phones and personal computers, capable of dealing with various expressions, angular or obstructed views and blurring, as well as complex and hundred-person level images.
Li Xu, CEO of SenseTime, says its AI-empowered visual technology can significantly boost productivity. “There are 250 million security surveillance cameras around the world. Humans’ eyes were all you could count on in the past, but AI-empowered computers can help us achieve superior results.”
Xu says SenseTime owes its success to deep learning and computer vision research and development. During its first two years, the company set up an R&D team of 200 scientists. Some had been researchers with top US universities like MIT and Stanford, others came from tech giants like Google and Microsoft.
The investment in R&D quickly paid off, as the team came up with an advanced deep learning framework comprising 1207 layers of neural network (the number indicates an incredible performance in accuracy). CUVideo, the prototype technology of SenseVideo, was built on this framework and won the Track 3C championship (object detection/tracking from videos with provided data) at the 2016 ILSVRC (ImageNet Large-Scale Visual Recognition Challenge, aka the Olympics of computer vision).
“A technology company should be at the cutting-edge of technology and push the envelope,” stresses Xu.
SenseTime’s research was soon transformed into marketable software solutions. For instance, SensePhoto can edit images in accordance with users’ preferences, and also classify pictures based on backgrounds, figures, and other characteristics. China’s top smartphone makers Huawei and Xiaomi integrated SensePhoto in their photo album applications.
SenseTime’s bold advances in visual technologies have attracted over 300 Chinese companies from sectors such as security and surveillance, finance, education and robotics. China’s E-commerce giant JD.com has installed SenseTime’s facial recognition technology in their products.
Although SenseTime has achieved surprising results in both research and application, the company is still striving to expand its computer vision technology in areas beyond facial recognition and object detection. SensePose is the latest manifestation of this, and an interesting example of how users can themselves create innovative content. We can definitely expect further game-changing innovations from this industry-leading company.
Author: Tony Peng | Editor: Michael Sarazen