With its fluent voice interactions and ever-widening library of skills, Amazon’s virtual assistant Alexa offers a world of convenience. But Alexa’s capabilities lie beyond the reach of many users who have hearing and speaking disabilities.
Digital accessibility issues are being addressed across many consumer product lines, and Amazon recently introduced an Alexa Captioning feature. But as with any challenge in AI, individual developers can also make innovative contributions. People like Abhishek Singh.
Last week, Singh posted a YouTube video, Making Amazon Alexa respond to Sign Language using AI, in which he smoothly communicates with Alexa using sign language. Alexa, which is installed in a laptop with a built-in camera, interprets Singh’s signed queries in real-time, converting them into text and delivering appropriate responses.
The video has won wide acclaim for its creativity and potential societal impact. By July 19 it had received over 2,200 retweets and almost 4,600 likes.
A self-described “creative technologist/artist/engineer,” Singh is a New York University grad who previously worked as a UX engineer for self-driving startup May Mobility and as creative lead at retail AI startup Aifi. His inspiration for the video came from a concern that people with hearing and speaking impairments could be shut out from voice-interaction technologies, particularly when tech companies are pushing these as the interface of the future.
Singh describes the video as a thought experiment, and told Synced in an email interview: “If these devices are to become a central way in which in interact with our homes or perform tasks then some thought needs to be given towards those who cannot hear or speak. Seamless design needs to be inclusive in nature.”
Singh explained his development process in detail. “I used [TensorFlow.js] to create a neural network and train it on a labeled sign language dataset (basically me performing the signs repeatedly and telling it which sign it is since sign language datasets are hard to find).
“Once it is trained on multiple examples of the various signs, I can feed it a new example input (image from the webcam of me performing a sign) and it will give its prediction (the text label of that sign). I used the predicted text to transcribe the signs to text and then I use Google text-to-speech to generate the voice which is heard by Alexa. Her response is then transcribed using Google speech-to-text and displayed to the user.”
A Google Brain developer commented on Singh’s video: “We’d love to feature your project so when you do open-source it, please send us a PR to update our (TensorFlow.js) gallery.”
Singh emphasises that this is still a very early proof of concept which he built with the intention of starting a conversation, not solving the entire complicated “sign language to text” problem.
While Amazon has thus far made no official comment on the video, a company spokesperson told Synced that the Seattle e-commerce and cloud computing giant is making advancements of its own for people with disabilities, such as Amazon Captioning, which displays Alexa’s responses in text on the screen.
Amazon’s Seattle neighbor Microsoft meanwhile committed US$25 million this May for its AI for Accessibility program. One of the products is Seeing AI, a mobile application designed to support people with visual impairments by narrating the environment around them. Google announced a similar research project at its annual I/O developer conference this year, namely Lookout.
Although the tech giants will likely drive R&D in this area, it’s always refreshing to see a standout garage project and so kudos to Singh, who says his back-end code will be open sourced soon on Github along with further technical details on his training process: “Hopefully people can take this and build on it further or just be inspired to explore this problem space.”
Singh is currently working on his own robotics startup.
Journalist: Tony Peng | Editor: Michael Sarazen