Contrastive Learning Advances Sleep Science: Superior Multi-Modal Model Enhances Disorder Detection

Synced

2 years ago

Sleep is a complex physiological process evaluated through various methods that record electrical brain activity, cardiac activity, and respiratory signals. Recent advancements in supervised deep learning have shown promise in automating sleep staging and diagnosing sleep disorders. However, many existing methods fail to fully utilize the extensive unlabeled physiological data available from diverse polysomnography (PSG) sensors.

In a new paper SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals, a research team from Stanford University and Technical University of Denmark introduces SleepFM, the first attempt at developing a multi-modal contrastive learning (CL) approach for PSG analysis, outperforming end-to-end trained convolutional neural networks (CNNs) in tasks like demographic attribute prediction and sleep stage classification.

SleepFM stands out in two significant ways. First, it employs self-supervised representation learning on a large sleep dataset, unlike most prior works that rely on supervised learning. Second, it is the first contrastive model to utilize a wide array of sleep modalities, including Brain Activity Signals (BAS), electrocardiogram (ECG) waveforms, and respiratory signals, encompassing 19 data channels across the brain, heart, and lungs.

The researchers curated a substantial polysomnography dataset from over 14,000 participants, totaling more than 100,000 hours of multi-modal sleep recordings collected at the Stanford Sleep Clinic between 1999 and 2020. They used contrastive learning (CL) as the foundational algorithm for representation learning during the pre-training stage.

Three 1D CNNs were used to generate separate embeddings from the BAS, ECG, and respiratory modalities, training them individually. The architecture of these embedding models is based on the EfficientNet design, beginning with atrous convolutions followed by multi-channel 1D convolutions. While the layer count matches the original EfficientNet design, the number of channels is significantly reduced to enhance model runtime efficiency and reduce complexity. After the initial atrous layers, the model uses convolutional layers with an inverted residual structure, maintaining input and output bottleneck layers with an intermediate expansion layer.

The team further explored two CL frameworks for learning joint representations across modalities: pairwise CL and leave-one-out CL. In pairwise CL, contrastive prediction tasks are constructed between all pairs of modalities, using a contrastive loss to promote agreement between positive pairs and discourage agreement between negative pairs. In leave-one-out CL, an embedding from one modality is used to identify the corresponding embeddings from the remaining modalities.

Results showed that the novel leave-one-out approach for contrastive learning significantly improved downstream task performance compared to representations from standard pairwise contrastive learning. A logistic regression model trained on SleepFM’s learned embeddings outperformed an end-to-end trained CNN in sleep stage classification and sleep-disordered breathing detection. Notably, the learned embeddings achieved 48% top-1 average accuracy in retrieving the corresponding recording clips of other modalities from 90,000 candidates.

This research represents the first attempt to build and evaluate a multi-modal foundation model for sleep analysis, highlighting the value of holistic multi-modal sleep modeling to fully capture the complexity of sleep recordings.

SleepFM is open source and available at project’s GitHub. The paper SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals is on arXiv.

Author: Hecate He | Editor: Chain Zhang

Share this: