The open source dataset is the very fuel driving AI engines. Datasets such as ImageNet have proved essential for image recognition. Chinese search giant Baidu now wants to bring the same power to conversational AI, the tech behind our rapidly-expanding voice-based human-machine interfaces.
At Silicon Valley-based GSV labs this Thursday, Baidu DuerOS launched Project Prometheus, which aims to advance state-of-the-art R&D in conversational AI. The project will release one of the world’s largest open datasets for Mandarin conversational AI as early as next January, to boost the capabilities of AI products like voice assistants and smart speakers.
Baidu is “All in on AI” with a heavy investment in AI-driven fields, especially conversational AI. China has a huge market potential: the IDC forecasts that by 2020, 27% of Chinese households will have smart home systems, 51% will have smart cars, and 68% smartphones and wearables. “Voice is increasingly becoming how we interact with our devices today,” says Kaihua Zhu, CTO of Baidu’s DuerOS.
DuerOS is Baidu’s Alexa or Siri — a platform for conversational AI. Launched earlier this year, DuerOS supports home appliances like TVs and smart speakers and mobile devices like phones or watches. Developers can access open-source SDKs and APIs to build third-party voice conversational services. And DuerOS’ bot platform provides countless skills.
DuerOS Project Prometheus’ large-scale dataset is the last piece of the puzzle. Three large-scale sub-datasets will be released: far-field wake word detection, far-field speech recognition, and multi-turn dialogue.
The wake word dataset will collect training data for five to ten popular Chinese wake words, including “Xiaodu, Xiaodu”, which activates DuerOS enabled devices. There are 100,000 different audio clips for each wake word. The evaluation data consists of real recordings of human voices with ambient sounds from different environments; and simulated recordings from TV, popular music, etc.
The speech recognition dataset will include 4,000 hours of Mandarin far-field speech recognition data, which enables hands-free devices to hear and process voice commands from afar or in noisy rooms. The dataset comes from multiple domains, including search, chatting, online comments, etc.
The multi-turn dialogue dataset will release 10,000 dialogues covering 10 different domains to promote the development of multi-turn conversation technology.
DuerOS also encourages third parties such as universities, companies and domain communities to organize challenges using these datasets.
Many AI researchers are excited by the project. Mei-Yuh Hwang, a leading speech recognition expert and CTO of Chinese AI unicorn Mobvoi, flew from Seattle to Silicon Valley to evaluate the dataset for Mobvoi’s R&D. “Data for conversational AI, especially for multi-turn dialogue, is rare.”
However, acquiring such data is known to be time-consuming and expensive, so it remains uncertain whether Baidu DuerOS can deliver the dataset as promised. Chenchen Guo, Baidu’s Principal Architect for DuerOS told Synced that Baidu DuerOS is looking for accountable data companies for quality control, and will also collect data directly for example by sending devices to vendors.
In addition to the datasets, DuerOS Project Prometheus features two other programs: a talent program, which will invest in conversational AI projects and foster talent; and a university program which will collaborate with universities and research organizations to conduct joint training, course design, and workshops.
The project includes advisors who are leading researchers in speech recognition and natural language processing at US universities, such as Sr. Manager of Amazon Machine Learning Dr. Björn Hoffmeister, Director of Human Language Technology Center of Excellence at Johns Hopkins Dr. Sanjeev Khudanpur, and CTO and Co-founder of a Stealth Conversational AI Startup Dr. Antoine Raux.
Prometheus is the Greek mythological figure who stole fire from Mount Olympus and gave it to mankind. Baidu believes DuerOS Project Prometheus can deliver a similar revolution in conversational AI.
Journalist: Tony Peng | Editor: Michael Sarazen