Speech technologies such as automatic speech recognition (ASR) and speech synthesis or text-to-speech (TTS) are playing an increasingly important role in many real-world applications. Contemporary speech technology systems however support only about one hundred languages at best — a tiny fraction of the over 7,000 languages spoken worldwide.
A Meta AI research team addresses this deficiency in the new paper Scaling Speech Technology to 1,000+ Languages, launching the tech giant’s Massively Multilingual Speech (MMS) project, which aims to expand speech technology capabilities and improve device-based information access for more than 1,000 global languages.
While the Internet is brimming with English-language content that can be used for model training, that is not the case for many lesser-spoken tongues. The first challenge facing the researchers was to collect data for such languages. They curated a labelled dataset comprising speech audio paired with corresponding text, MMS-lab, which is based on readings of publicly available religious texts such as the New Testament that have been translated into over 1,000 languages; and also employed an audio-only dataset, MMS-unlab, comprising unlabelled speech in 3,809 languages.
With these datasets at hand, the researchers built pretrained wav2vec 2.0 models covering 1406 languages, a single multilingual automatic speech recognition model and speech synthesis models for 1107 languages, and a language identification model for 4,017 languages.
In their empirical study, the team compared MMS with strong baseline models such as OpenAI’s Whisper, ASRL and XLS-R. The proposed MMS models bettered baselines on word error rate while covering ten times as many languages. The team attributes the encouraging results in large part to recent improvements in self-supervised speech representation learning, which enabled more sample-efficient learning from labelled data.
The MMS project takes a significant step forward in the expansion of multilingual speech technology, which the team hopes will contribute to the preservation of lesser-spoken languages and global language diversity.
The models and code are available on the project’s GitHub. The paper Scaling Speech Technology to 1,000+ Languages is on research.facebook.com.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.