Unlocking the ability to decode speech directly from brain activity has long been a cherished goal in healthcare and neuroscience. This breakthrough promises to restore communication for individuals grappling with traumatic brain injuries, strokes, and neurodegenerative disorders.
Recent strides in the field, driven by invasive devices and the power of deep learning algorithms, have enabled the decoding of fundamental linguistic elements such as letters, words, and audio-spectrograms. However, the translation of this approach to natural speech and non-invasive brain recordings has remained a formidable challenge.
In a new paper Decoding speech perception from non-invasive brain recordings, a research team from Meta AI, Inria Saclay and PSL University exhibits the remarkable capability to decode speech from brain signals recorded non-invasively through magneto-encephalography (MEG) or electro-encephalography (EEG).
The team summarizes their main contributions for the development of non-invasive Brain Computer Interface (BCI) as follows:
- Efficiency in Speech Decoding: The study demonstrates how pretrained speech models can facilitate the decoding of speech in the brain without subjecting volunteers to the laborious repetition of each individual word targeted by the decoder.
- Optimized Design Choices: The research showcases how specific design choices, including contrastive learning and a multi-subject architecture, enable efficient processing of continuous EEG and MEG recordings. This offers valuable data-driven insights for the future development of BCIs.
- High-Level Representations: The analyses conducted suggest that the resulting decoder predominantly relies on high-level representations of words and phrases.
Deciphering language from non-invasive brain recordings like EEG and MEG poses significant challenges. These devices generate inherently noisy signals that exhibit substantial variations across sessions and individuals. To tackle this predicament, the research team proposes a two-pronged approach:
- Unified Architecture and Deep Speech Representations: They advocate for using a single architecture trained across a large cohort of participants and deep representations of speech acquired through self-supervised learning on an extensive corpus of speech data.
- Model Structure: Their proposed model consists of a convolutional neural network stacked onto a “Subject Layer,” trained with a contrastive objective. It aims to decode speech from the brain activity of healthy participants as they listen to stories and sentences.
In detail, the model extracts deep contextual representations from 3-second speech signals, utilizing a pretrained ‘speech module’ (wav2vec 2.0). It then learns the corresponding representations of brain activity within the same 3-second window, aligning them optimally with the speech representations using a contrastive loss function.
To validate their approach, the researchers curate and integrate four publicly available M/EEG datasets, encompassing brain activity recordings from 175 participants engaged in passive listening to sentences or short stories. Remarkably, when presented with just a 3-second snippet of M/EEG signals, the proposed model can identify the matching audio segment with an impressive top-10 accuracy of up to 72.5% for MEG and up to 19.1% for EEG.
Overall, the contribution of this work marks a significant milestone in the journey toward harnessing non-invasive brain recordings for speech decoding, offering newfound hope for patients with communication impairments stemming from neurological conditions.
The paper Decoding speech perception from non-invasive brain recordings on Nature.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.