Human-machine interaction is rapidly evolving. Today, 12% of Chinese users opt for voice input over typing. iFlytek, the country’s most popular voice input method, can translate English voice input directly into Mandarin or vice versa. It also translates Mandarin into Korean and Japanese and tailors voice input for over 22 different Chinese dialects.
On November 7, the Smartisan Nut Pro 2 smartphone was released with built-in iFlytek voice input method.
For years iFlytek strived to improve its speech recognition accuracy rate to the current 98%. Using natural language interaction, the Chinese voice recognition unicorn solved challenging problems such as homonyms and wrong words. But there’s always a new target to aim for. “In three years, voice recognition for personalized voice users can reach 99% accuracy,” predicts iFlytek Input Product Manager Jibo Zhai.
Commercialization of voice recognition technology began in the late 1990s with IBM’s trailblazer product Via Voice. But it was hardly a game-changer, and appraisals came mainly from industry insiders. Some claimed that it didn’t work at all. Ten years later Motorola made the same endeavor with its A1200 phone to no avail.
When touchscreens appeared on the market, companies like Google began redirecting R&D to voice input, giving birth to a milestone in Google Voice Search, which transcribed and then searched from voice inputs. The massive amount of incoming data helped to optimize the product, creating a feedback loop with fast iterations. IBM’s Via Voice had lacked this connection to billions of network users.
“Big data, cloud computing, and machine learning frameworks satisfy the needs of input methods,” explains Zhai.
iFlytek’s voice input team occupies the entire 7th floor of the company headquarters in Hefei, a low-key city in central-eastern China. As a steady stream of keyboard clicks flows through the room, youngsters huddle around a table brainstorming. Happy photos from company field trips fill the bulletin board behind them.
The relative normalcy contrasts with media bombardment visited on the offices after iFlytex’s 2016 debut in the Smartisan T3.
“The IME project began from point zero when iOS and Android systems first launched,” says Kun Cheng, one of the original trio of engineers who worked on the project.
The project began with Cheng and his colleagues multi-tasking without a clear division of labour, writing lines of code during the day and testing at night. The iFlytek input method officially launched two months later using the Hidden Markov Model. But the accuracy rate was only 60%, meaning 40 of every 100 words were wrong. Even if accuracy were to increase to 80%, the program would be barely usable.
By 2014, iFlytek voice recognition accuracy had reached 97%, and garnered 200 million users. Today the product has 500 million users in China. People are no longer surprised by the technology — voice input has become a norm.
iFlytek Input accommodates keyboard, handwriting, and voice input. If you want to write, start writing; if you want to type, tap the keyboard; if you want to speak, tap the speaker icon. ‘We have a patent for this,” Zhang Yuan, operation manager of iFlytek tells us while performing a demonstration.
“How would you rate your product?” I ask.
“75 out of 100,” reckons Zhai. “We are good with generic words, but still fall short on customization.”
‘Two major changes have come to IME,” Zhai explains. “The first was connecting to the internet, and the second was its augmentation with AI.” Convolutional neural networks have helped to raise handwriting accuracy rate by 30% and shortened writing lag time to 0.15 seconds. “Colleagues from our AI research institute and input department meet regularly to exchange insights. They brief us on possible applications, and we give them user experience feedback.”
The iFlytek IME team has grown to 100 members, and the entire company now has 8,000 employees. The company’s China Sound Valley building has become a landmark, and the view from the 7th-floor window shows two new commercial buildings on the rise. “When we first moved here, it was quite barren,” says Yuan.
Clearly, the voice input business is also on the rise.
The QWERTY keyboard has been with us since 1873, and because people are creatures of habit it likely won’t disappear overnight. But in the eyes of future generations accustomed to voice input, the keyboard may very well be a relic, as quaint and confusing as the video cassette recorder is today.
Journalist: Wei Zhou, Meghan Han | Editor: Michael Sarazen