Speech Processing

by Synced 2024-02-11 2

Introducing NVIDIA’s Audio Flamingo, the Next Frontier in Audio Language Models

An NVIDIA research team introduces Audio Flamingo, a groundbreaking audio language model that incorporates in-context learning (ICL), retrieval augmented generation (RAG), and multi-turn dialogue capabilities, achieving SOTA performance across various audio understanding tasks.

by Synced 2023-10-09 2

AI Machine Learning & Data Science Research

Mind-to-Speech: The New Frontier in Neuro Communication Through Perception From Non-Invasive Brain Signals

In a new paper Decoding speech perception from non-invasive brain recordings, a research team from Meta AI, Inria Saclay and PSL University exhibits the remarkable capability to decode speech from brain signals recorded non-invasively through magneto-encephalography (MEG) or electro-encephalography (EEG).

by Synced 2023-03-09 4

AI Machine Learning & Data Science Nature Language Tech Research

Google’s Universal Speech Model Scales Automatic Speech Recognition to 100+ Languages

In the new paper Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages, Google introduces the Universal Speech Model (USM), a scalable self-supervised training framework that extends automatic speech recognition to more than 100 languages.

by Synced 2023-01-26 3

AI Machine Learning & Data Science Research

Stanford U’s Brain-Computer Interface Enables Stroke and ALS Patients to ‘Speak’ 62 Words per Minute

A Stanford University research team presents a brain-computer interface for translating speech-related neural activity into text (speech BCI) in the new paper A High-performance Speech Neuroprosthesis. Theirs is the first speech BCI to record impulse activity from intracortical microelectrode arrays and could benefit people unable to produce clear utterances due to diseases such as stroke and ALS.

by Synced 2022-06-16 1

AI Machine Learning & Data Science Research

Wav2Vec 2.0 Learns Brain-Like Representations From Just 600 Hours of Unlabeled Speech Data in New Study

In the new paper Toward a Realistic Model of Speech Processing in the Brain with Self-supervised Learning, researchers show that self-supervised architectures such as Wav2Vec 2.0 can learn brain-like representations from as little as 600 hours of unlabelled speech; and can also learn sound-generic and speech- and language-specific representations similar to those of the prefrontal and temporal cortices.

by Synced 2020-10-02 5

Computer Vision & Graphics Machine Learning & Data Science Nature Language Tech

Google AI ‘Looking to Listen’ Tech Enables Speech Enhancement on YouTube Stories in Seconds

Google AI has announced a new audiovisual speech enhancement feature in YouTube Stories (iOS) that enables creators to make better selfie videos by automatically enhancing their voices and reducing noise.