AI Machine Learning & Data Science Research

Meta AI’s Massively Multilingual Speech Project Scales Speech Technology to 1000+ Languages

In the new paper Scaling Speech Technology to 1,000+ Languages, a Meta AI research team launches the company’s Massively Multilingual Speech (MMS) project, which aims to dramatically expand access to speech technologies.

Speech technologies such as automatic speech recognition (ASR) and speech synthesis or text-to-speech (TTS) are playing an increasingly important role in many real-world applications. Contemporary speech technology systems however support only about one hundred languages at best — a tiny fraction of the over 7,000 languages spoken worldwide.

A Meta AI research team addresses this deficiency in the new paper Scaling Speech Technology to 1,000+ Languages, launching the tech giant’s Massively Multilingual Speech (MMS) project, which aims to expand speech technology capabilities and improve device-based information access for more than 1,000 global languages.

While the Internet is brimming with English-language content that can be used for model training, that is not the case for many lesser-spoken tongues. The first challenge facing the researchers was to collect data for such languages. They curated a labelled dataset comprising speech audio paired with corresponding text, MMS-lab, which is based on readings of publicly available religious texts such as the New Testament that have been translated into over 1,000 languages; and also employed an audio-only dataset, MMS-unlab, comprising unlabelled speech in 3,809 languages.

With these datasets at hand, the researchers built pretrained wav2vec 2.0 models covering 1406 languages, a single multilingual automatic speech recognition model and speech synthesis models for 1107 languages, and a language identification model for 4,017 languages.

In their empirical study, the team compared MMS with strong baseline models such as OpenAI’s Whisper, ASRL and XLS-R. The proposed MMS models bettered baselines on word error rate while covering ten times as many languages. The team attributes the encouraging results in large part to recent improvements in self-supervised speech representation learning, which enabled more sample-efficient learning from labelled data.

The MMS project takes a significant step forward in the expansion of multilingual speech technology, which the team hopes will contribute to the preservation of lesser-spoken languages and global language diversity.

The models and code are available on the project’s GitHub. The paper Scaling Speech Technology to 1,000+ Languages is on research.facebook.com.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

5 comments on “Meta AI’s Massively Multilingual Speech Project Scales Speech Technology to 1000+ Languages

  1. Incredible

  2. Osh University

    Being the best medical university in kyrgyzstan, Osh University is distinguished by its top-notch instruction and cutting-edge facilities. Our highly qualified professors and demanding curriculum guarantee a top-notch medical education. You can rely on Osh University to give you the education and training you need to pursue a successful medical career in Kyrgyzstan and elsewhere.

  3. LunwenHelp

    With the advancement of globalization, pursuing education abroad has become a path sought after by numerous students for knowledge acquisition and personal development. However, it comes with new academic challenges and cultural adaptation issues. In this context, assignment writing services http://www.emwchinese.com/ have swiftly emerged, offering substantial benefits to international students. Despite the controversy surrounding this topic, it cannot be denied that assignment writing aids students in adapting to new environments, enhancing academic performance, and alleviating academic pressures.

  4. Web Application Penetration Testing is essential for businesses in Chennai to safeguard their digital assets from cyber threats. With increasing cyberattacks, organizations must proactively identify and fix vulnerabilities before they are exploited. Penetolabs offers expert penetration Web Application Penetration Testing in Chennai testing services, ensuring web applications remain secure against potential breaches. Their team follows industry-best practices to detect and mitigate risks effectively. Chennai-based businesses can trust Penetolabs for comprehensive security assessments, helping them stay compliant and resilient in today’s evolving cybersecurity landscape.

  5. michaelarrington

    This breakthrough not only broadens the scope of speech technology but also has the potential to contribute to the preservation and documentation Sprunki Pyramixed of endangered languages, promoting linguistic diversity worldwide.

Leave a Reply

Your email address will not be published. Required fields are marked *