Today any smartphone can generate 3D Photos, but the popular AI-powered effect is actually fairly new. It was back in 2018 that Facebook first introduced a machine learning-based 3D photo feature that allowed users to generate an immersive 3D image from normal 2D pictures. Leveraging the dual-lens “portrait mode” capabilities that had recently become available in smartphones, the feature quickly gained traction and began evolving.
This June, a research team from Virginia Tech, National Tsing Hua University and Facebook designed an algorithm that generates even more immersive 3D photos from a single RGB-D (colour and depth) image. And in August, Facebook democratized the technique with a novel system able to generate 3D photos even on low-end mobile phones or without an Internet connection.
Facebook isn’t the only tech giant using AI to generate 3D photos — in recent months, Google has introduced its own AI techniques for generating 3D photos from 2D images. Computer vision (CV) has achieved remarkable progress across various subfields and real-world applications, but few bring the everyday user as visually appreciable an effect as the trending medium of 3D photo generation’s ability to deepen the immersive experience of a captured moment.
Another common use of 3D images is in AR and VR applications, where they can provide viewers a unique lifelike experience. Today’s smartphone authentication systems also rely on 3D image capability. The FaceID process for example employs an array of flood illuminators, sensors and an infrared camera to generate the 3D facial depth map it uses for verification and unlock.
Synced has identified a few significant technical advancements in the 3D photo field that we believe may be of interest to our readers.
Smartphone Camera Updates
Google recently released a demo app that visualizes a real-time point cloud from uDepth, the stereo depth sensor built into the Pixel 4 and Pixel 4 XL Google smartphones. The real-time infrared (IR) active stereo depth sensor leverages machine learning techniques and supports various features for the generation of 3D photos. Depth sensing is a valuable tool for 3D photos generation since it can determine the 3D information about a scene from 2D images.
Human brains perceive depth using offset image signals from the left and right eye. Computer vision systems meanwhile can reconstruct three-dimensional depth information from two distinct image views by correlating their pixels. Stereo camera systems reconstruct depth in this way, using parallax, the apparently different positioning of objects when seen from different viewpoints. UDepth can estimate parallax computationally for each pixel in the image where real physical distances can be reconstructed.
Google’s latest Pixel 4 smartphones use the uDepth stereo depth sensor to enable their 3D photo effect, a method that replaces the previous dual-lens “portrait mode” approach that enabled Facebook’s 3D Photo feature debut in 2018.
Portrait Mode was first introduced in 2016, when the iPhone 7 Plus used the smartphone’s two lenses and Apple software to create an artificial bokeh effect, where the picture’s subject remains in focus but the background is blurred. The subsequent widespread adoption of portrait mode in smartphone cameras enabled 3D photos to flourish. A 2019 Facebook Tech blog post explains the evolution, “in recent years, many major smartphone makers have added a dual-lens portrait mode to the cameras in some of their phone models. This setup allows them to take stereo photos and compares the two images to create a rough depth map… Facebook then used the gradient to build a 3D model and layer on the color image. Together with AI systems working in the background, you get the amazing 3D photos…”
Improved AI Techniques
The team behind this June’s 3D photo paper noted how novel AI techniques and new algorithms have also played a critical role in the successful methods for creating 3D photos from 2D colour images. AI-powered image-rendering techniques such as inpainting synthesis regions have offered opportunities to record and reproduce visual perception to the the limits of lifelikeness in virtual reality.
The Facebook 3D photo feature is flexible across all imaginable image scenes, and can generate accurate and visually consistent surface textures rather than showing gaps or stretching. The researchers use a Layered Depth Image (LDI) technique with explicit pixel connectivity as the underlying representation. This LDI representation is more compact due to its sparsity, and can be converted into a light-weight mesh representation for rendering. Taking an RGB-D (colour and depth) image as input, the learning-based inpainting model synthesizes new colour or depth textures and structures into the occluded regions of the image in a spatial context-aware manner. Since the generated 3D photos can be rendered with motion parallax using standard graphics engines, the approach also works on resource-constrained devices.
Synthesizing novel views of a scene from a sparse set of image captures is a 3D technique that has seen recent breakthroughs and is a prerequisite to many applications in the growing augmented and virtual reality (AR/VR) market, which hit US$18.8 billion in 2020. The SOTA view-synthesis method Neural Radiance Fields (NeRF) introduced by Google Researchers can produce high-quality 3D representations of complex scenes with only unstructured image collections as input. The approach improves NeRF’s ability to model common real-world phenomena such as variable illumination conditions or transient occluders found in such uncontrolled images.
The race to generate 3D photos from 2D images using AI advancements and hardware updates has also produced models that simplify 3D photography capture by using phone cameras and lowering other baseline requirements. This trend aligns with Facebook’s efforts in recent years to build AR tools for smartphone cameras.
In his keynote speech at the 2017 Facebook F8 developer conference, company CEO Mark Zuckerberg identified smartphone cameras as the first mainstream AR vehicle and launched the Camera Effects Platform, which has since evolved into the huge mobile AR platform Spark AR. With 600 million monthly users across Facebook and Instagram, Spark has published a total of 1.2 million AR effects. The match makes sense, as Facebook noted at the time, “people are already using the cameras on their phones to write text on images, add digital objects and modify existing things with face filters and style transfers.” Why not 3D and AR/VR?
Beyond the potential new lifelike and increasingly immersive experiences that users can expect in VR, this research can also improve understanding of 3D scenes across a range of other applications, such as helping robots navigate and interact with the physical world.
Facebook’s ongoing commitment to AR/VR has resulted in a long-anticipated merging of its AR/VR teams, rebranded in August as “Facebook Reality Labs (FRL).” From its humble beginnings, the 3D Photo feature has powered Facebook’s ambitions to use advanced technology to inform the next generation of social sharing. The social networking giant eyes expanding the research and development of AR/VR tools ever further, “to help people feel more present with each other, even when we’re apart.“
Reporter: Fangyu Cai | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.