AI Machine Learning & Data Science Research

CMU’s ASR2K Pipeline Recognizes Speech in 1909 Languages Without Audio

In the new paper ASR2K: Speech Recognition for Around 2000 Languages Without Audio, a Carnegie Mellon University research team introduces a speech recognition pipeline that can recognize almost 2000 languages without audio requirements.

AI-powered speech recognition systems have made great progress in recent years, with speech-to-text processing now so powerful that the occasional errors are little more than curious exceptions. Most contemporary models addressing this task however require massive labelled training data — which is simple enough to source for English, Chinese, and other popular languages but challenging to obtain in the case of the low-resource tongues that make up the majority of the world’s 8,000 languages.

To address this issue, a Carnegie Mellon University research team has developed a speech recognition pipeline that can recognize 1909 languages without any audio for the target language. Their ASR2K pipeline achieves impressive 45 percent CER and 69 percent WER scores when using 10,000 raw text utterances on the CMU Wilderness dataset, and is introduced in the paper ASR2K: Speech Recognition for Around 2000 Languages Without Audio.

The proposed pipeline comprises separate acoustic, pronunciation, and language models. The acoustic model is used to recognize phonemes of the target languages, including unseen languages. The pronunciation model is a grapheme-to-phoneme (G2P) model that predicts the phoneme pronunciation given a grapheme sequence. Both the acoustic and pronunciation models can first be trained using supervised datasets from high-resource languages and will then apply their learned linguistic knowledge to low-resource languages without supervision.

The team uses raw text datasets or n-gram statistics to build the ASR2K language model. Each word’s pronunciation is approximated using the pronunciation model, and this information is encoded into a lexicon graph. The text dataset also enables the model to estimate a classical n-gram language model by counting the n-gram statistics. This language model is then combined with the pronunciation model to develop a weighted finite-state transducer (WFST) decoder.

In their empirical study, the team applied the proposed method to 1909 languages on the Crúbadán large endangered languages n-gram database and tested it on 34 languages from the Common Voice dataset and 95 languages from the CMU Wilderness Multilingual Speech dataset.

In the evaluations, the proposed ASR2K pipeline achieved 50 percent CER (character error rate) and 74 percent WER (word error rate) scores using Crúbadán’s statistics only; and reached 45 percent CER and 69 percent WER when using 10,000 raw text utterances.

The researchers believe theirs is the first attempt to build a speech recognition pipeline for thousands of languages without audio. The paper and associated code will be published at the 23rd INTERSPEECH Conference, which runs from September 18 to 22 in Incheon, South Korea.

The code will be available on the project’s GitHub. The paper ASR2K: Speech Recognition for Around 2000 Languages Without Audio is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

3 comments on “CMU’s ASR2K Pipeline Recognizes Speech in 1909 Languages Without Audio

  1. John Palmer

    Discord is a free voice, video, and text chat app that’s used by tens of millions of people. And people use Discord daily to talk about many things, ranging from art projects and family trips to homework, games and mental health support. I already bought one account here. They also have a lot of other safe account. I highly recommend that you look attention to this site https://get-accs.com/discord-accounts/ , maybe it will be very helpful like for me. Good luck.

  2. Pingback: CMU’s ASR2K Pipeline Recognizes Speech in 1909 Languages Without Audio - Synced - Globalnewsn

  3. rehman

    Hello
    I’m super happy about what I found! Now, I love changing different songs from YouTube into music files because I’m a huge music lover. Anyone can use this converter, and it doesn’t cost anything!normalize audition

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: