Over the past year Chinese startups and tech giants alike have developed their own voice technology ecosystems comprising product manufacturers, tech solution providers, content makers, and platforms designed to help hardware or skill applications “speak and be smarter.” In this article, we look at the current state of open voice platforms in China.
The DuerOS smart ecosystem is composed of two basic protocols — “dialogue service” and “skills framework.” It provides a multi-stack solution combining both hardware and software for traditional hardware vendors and developers, including personal, lightweight, standard, and reference designs. Its low cost and flexibility make it suitable for a wide range of vendors and developers. At the core of DuerOS is a dialogue system driven by Baidu’s strong data, technology, and content.
The AliGenie open voice platform launched by Ali AI Labs provides voice interaction technology, natural language processing capabilities, cloud service systems, development kits and so on for hardware brands and solution providers’ smart speakers, TVs, toys, cars, etc. For individual and industry application developers, AliGenie provides a suite of five core voice capabilities including voice wake, speech recognition, voiceprint recognition, semantic understanding, and speech synthesis, enabling content providers to quickly create voice skills using content access suites. Application developers meanwhile can use custom skill components to customize a variety of skills. AliGenie Open Platform provides vertical industry solutions with applications in offline retail, airlines, hotels and other scenarios.
Tencent Cloud Xiaowei was launched in 2017 with three application areas: the Skill Open Platform, the Hardware Open Platform, and the Customer Service Robot Xiaowei. Tencent Cloud provides intelligent voice dialogue for a variety of devices; the hardware open platform can output voice interaction capabilities to third-party hardware vendors; while small and micro customer robots can help users improve efficiency and reduce labor costs. As a social networking giant, Tencent’s advantage is that it not only has a huge amount of video and music content resources, but also hundreds of millions of users in the cloud.
iFLYTEK is the largest intelligent voice listed company in China. Its open platform offers a one-stop smart human-computer interaction solution, providing developers with a variety of services such as speech synthesis, speech recognition, speech arousal, semantic understanding, face recognition, personalized color ringtones, and “Cloud + End” mobile application analysis.
The AISpeech DUI open platform is a one-stop interactive customization platform with D-dialogues as the core, covering multiple application scenarios and third-party content resources. The platform features a built-in voice skills library for the Internet of Things, mobile Internet, and the Internet. As a full-link intelligent dialogue open platform, DUI provides not only a dialog function based on ThinkBin intelligent speech language technology, but also comprehensive services to help the developer customize the dialogue system, such as GUI customization and version management. Private cloud deployments allow developers to customize the conversational interactive systems.
UniSound’s open platform provides a template for intelligent voice interactive applications based on specific application scenarios. It provides developers with a complete and convenient development environment for intelligent speech interaction systems. The platform’s solutions are similar to development templates. An experienced voice interaction technology provider, Unisound operates in mobile Internet, smart home, wearable devices, in-vehicle navigation, medical care, education, and call centers.
The Mobvoi AI home platform is free to developers and hardware vendors, and an integrated SDK can be downloaded immediately from the platform’s website. The integration of full-stack voice interaction technology is available, and platform tools are easy to operate and adapt to multiple scenarios. Product integration vendors can independently target products based on their own requirements. Individualized products are developed for individual functions.
Rokid’s open voice platform service includes both Rokid Skills development tools and Rokid Voice Access. It helps developers build new skills for any devices equipped with Rokid open services, enabling users to meet various voice interaction needs. Rokid Voice Access can enable the intelligent, scalable voice capabilities provided by Rokid Open Services for networked hardware devices equipped with microphones and speakers. Rokid promises to open source 100 percent of the platform’s hardware technology and share 70 percent of its code.
Open Voice Platforms Market Insight
Speech recognition rate is the most intuitive metric for determining user experience. iFLYTEK, AIspeech and UniSound specialize in speech recognition technology, are strong in niche voice technology, and all have a long-term interest in growing mainstream voice market opportunities.
The BAT giants (Baidu, Alibaba, & Tencent) meanwhile entered the race late, developing their own artificial intelligence technologies and leveraging big data from their Internet operations to stake their place in the intelligent voice market.
It is hard to say which companies have an overall competitive edge at this time or who will take the lead in voice platform development. But tech startups with their first-mover advantage and commercial giants with abundant capital and resources are all highly motivated in this high reward tech arena.
Source: Synced China
Localization: Meiling Wu | Editor: Michael Sarazen