A broken computer, appliance or car used to mean a visit to a technician or mechanic, but the recent proliferation of DIY videos has more people performing such repair jobs themselves. Now, a pair of IBM researchers have taken instructional video to the next level, with a new fine-grained visual recognition approach and augmented reality (AR) system that can look at the actual piece of hardware being working on and integrate real-time, step-by-step tech support and guidance.
The researchers say the proposed method can increase the rate of first-time fixes and reduce hardware disruption recovery time by automatically detecting the state of an object and presenting the right set of information in the right context.
AR basically overlays media and graphics on what we see in the real world. Major technological advances and the increased availability of AR software development kits (SDKs) such as ARKit and ARCore over the last decade have lowered the entry barrier for AR developers. In recent years, machine learning has informed the emergence of intelligent systems that further enhance the AR experience.
Despite all the progress, the IBM researchers say most AR experiences remain primitive and lack intelligence and automation, which results in unintuitive user interactions. As users seek increased automation and new and more natural interactions in AR, new visual recognition techniques will have to be developed.
“Our research addresses this gap and provides enriched AR user experiences by enabling a more fine-grained visual recognition feature in AR, which is desirable in a wide range of application scenarios including technical support,” the researchers say.
The proposed solution leverages AR specific data such as real-time generated 3D feature points and camera pose to complement images captured by the camera for fine-grained visual recognition.
The researchers first used a set of training videos to learn Regions of Interest (RoIs) where appearance changes can distinguish different states. Because these videos included images that were motion-blurred or unclear due to viewing angle, etc., a filtered RoI image set with device movement and rotation speed actively tracked was created to minimize occlusions or other noise in the images used to train the visual recognition model.
To improve recognition robustness the researchers developed a Discrete-CNN-LSTM (DCL) model comprising a discrete multi-stream convolutional neural network with bi-directional long short-term memory. The model can extract both spatial and temporal data to predict state changes.
The researchers compared their DCL models with LW (a naive light-weight CNN model) and representative object recognition model VGG16. In RoI capturing, the DCL model had the highest accuracy, 99.87 percent.
The researchers say their fine-grained visual recognition method will enable AR systems to provide a more immersive and intuitive user experience. Its ability to detect very subtle changes to tiny connectors etc. and guide users reliably through repair processes could open the door to a wide range of automatic and immersive AR-based self-assist user experiences.
The researchers have built an iOS application using ARKit and Tensorflow to demonstrate the effectiveness of their solution and provided comprehensive evaluations in a hardware maintenance application scenario.
The paper Fine-Grained Visual Recognition in Mobile Augmented Reality for Technical Support is on IEEE Xplore.
Reporter: Yuan Yuan | Editor: Michael Sarazen
This report offers a look at how China has leveraged artificial intelligence technologies in the battle against COVID-19. It is also available on Amazon Kindle. Along with this report, we also introduced a database covering additional 1428 artificial intelligence solutions from 12 pandemic scenarios.
Click here to find more reports from us.
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.