AI Machine Learning & Data Science Research

Brain2Music: Unveiling the intricacies of Human Interactions with Music

In a new paper Brain2Music: Reconstructing Music from Human Brain Activity, a research team from Google, Osaka University, NICT and Araya Inc. introduces Brain2Music, an approach for reconstructing music from brain activity by MusicLM, aiming to gain insights of the relationships between brain activity and human cognitive and sentimental experiences.

Music is a universal language, transcending cultural boundaries worldwide. With the swift advancement of Large Language Models (LLMs), neuroscientists have shown a keen interest in investigating the representation of music in our brains.

In alignment with this interest, a research team from Google, Osaka University, NICT, and Araya Inc. presents Brain2Music in a new paperBrain2Music: Reconstructing Music from Human Brain Activity. This approach leverages MusicLM to reconstruct music from brain activity, generating compositions that resemble the original musical stimuli. This novel method offers valuable insights into the relationship between brain activity and human cognitive and sentimental experiences.

The team summarizes their main contributions as follows:

  1. We reconstruct music from fMRI scans by predicting high-level, semantically-structured music embeddings and using a deep neural network to generate music from those features.
  2. We find that different components of our music generation model are predictive of activity in the human auditory cortex.
  3. We offer novel insights suggesting that within the auditory cortex there is significant overlap in the voxels that are predictable from (a) purely textual descriptions of music and (b) the music itself.

The team first pre-processes the music genre neuroimaging dataset, which contains music stimuli from 10 genres, including blues, classical, country, disco, hip-hop, jazz, metal, pop, reggae, and rock. And they augment the dataset by applying English text captions that describes the music in terms of genre, instrumentation, rhythm, and mood.

The Brain2Music pipeline process begins by condensing high-dimensional fMRI responses into the semantic, 128-dimensional MuLan embedding music embedding through linear regression. Next, the researchers apply MusicLM (Agostinelli et al., 2023), the music generation model, to generate the music reconstruction that represents the original stimulus.

The MusicLM consists of two stages for music generation. In the first stage, it learns to map a MuLan embedding to a low-level representation of w2v-BERT tokens with temporal information while in the second stage it converts the generated tokens back into audio using a SoundStream decoder.

In their empirical study, the team evaluated the similarity of reconstructed music and original stimulus in terms of identification accuracy and AudioSet top-n class agreement. The results support that the proposed approach has the capability to extract musical information from the fMRI scans, and has the most faithful reconstruction results to the original stimulus. Moreover, Brain2Music demonstrates strong generalization to the unseen music genre.

Overall, this work is the first to provide a quantitative interpretation from a biological perspective about the connection between music and the human brain activities. They encourage more future work to improve the temporal alignment between reconstruction and stimulus.

The paper Brain2Music: Reconstructing Music from Human Brain Activity on arXiv.

Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Brain2Music: Unveiling the intricacies of Human Interactions with Music

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: