Brain-computer interfaces (BCIs) have made remarkable progress in decoding language from the brain using intracranial recordings obtained via electrodes inside the skull that capture electrical activity from the brain, helping patients suffering from brain or spinal cord injuries regain basic communication skills. However, scaling this approach to non-invasive brain recordings and natural speech remains a major challenge.
In the new paper Decoding Speech From Non-Invasive Brain Recordings, a research team from Meta AI and the Inria Saclay Centre presents a single end-to-end architecture for decoding natural speech processing from non-invasive magnetoencephalography (MEG) or electroencephalography (EEG) brain recordings that can detect macroscopic brain signals in real-time using a safe and potentially wearable setup.
The study targets speech decoding from the brain activities of healthy patients recorded with MEG or EEG techniques while they listen to stories or sentences. The proposed model extracts deep contextual representations of speech signals and leverages contrastive learning to predict representations of the audio waveform from a module pretrained on 56k hours of speech from 53 languages.
The researchers leverage the wav2vec 2.0 framework (Baevski et al., 2020) to enable rich speech representations; and introduce a convolutional neural network trained with contrastive learning to predict self-supervised representations of natural speech.
The system’s brain module architecture comprises a spatial attention layer over the M/EEG sensors followed by a subject-specific 1×1 convolution designed to leverage inter-subject variability, which input to a stack of convolutional blocks that output the latent brain representation.
In their empirical study, the team applied the proposed model to the task of decoding audio segments from the brain activity of 169 human participants passively listening to speech, where it achieved zero-shot decoding of 3-second speech sounds with up to 73 percent top-10 accuracy for MEG and up to 19.1 percent top-10 accuracy for EEG.
Overall, this work greatly advances the progress of BCIs, enabling the decoding of natural speech from non-invasive brain recordings and achieving impressive zero-shot speech sound decoding performance.
The paper Decoding Speech From Non-Invasive Brain Recordings is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.