AI Machine Learning & Data Science Research

Yann LeCun Team’s Neural Manifold Clustering and Embedding Method Surpasses High-Dimensional Clustering Algorithm Benchmarks

A team from UC Berkeley and Facebook AI Research proposes a Neural Manifold Clustering and Embedding (NMCE) method for general-purpose manifold clustering that significantly outperforms autoencoder-based deep subspace clustering approaches.

In unsupervised representation learning, the goal is to learn structures or features from unlabelled data. While various methods can be employed depending on how the data is presented, a particularly challenging scenario is when the data points come from a union of several non-linear low-dimensional manifolds. In this case, researchers have turned to non-linear subspace clustering or manifold clustering approaches, which aim to cluster data points based on manifold structures while also learning to parameterize each manifold as a linear subspace in a feature space.

Given their large capacity and flexibility, deep neural networks (DNNs) also have the potential to address this clustering problem, although their use in this area remains restricted by two requirements: a domain-specific constraint to secure manifold identification, and a learning algorithm for manifold embedding to a linear subspace in the feature space.

In the new paper Neural Manifold Clustering and Embedding, a team from UC Berkeley and Facebook AI Research proposes a method for general-purpose manifold clustering that implements constraints via data augmentation and uses the Maximum Coding Rate Reduction (MCR2) objective (Yu et al. 2020) for subspace feature learning. The team’s resulting Neural Manifold Clustering and Embedding (NMCE) method significantly outperforms both autoencoder-based deep subspace clustering approaches and algorithms specifically designed for clustering.

The team summarizes their main contributions as:

  1. We combine data augmentation with MCR2 to yield a novel algorithm for general-purpose manifold clustering and embedding (NMCE). We also discuss connections between the algorithm and self-supervised contrastive learning.
  2. We demonstrate that NMCE achieves strong performance on standard subspace clustering benchmarks, and can outperform the best clustering algorithms on more challenging high dimensional image datasets like CIFAR-10 and CIFAR-20. Further, empirical evaluations suggest that our algorithm also learns a meaningful feature space.

The NMCE design follows three principles: 1) The clustering and representation should respect a domain-specific constraint, e.g. local neighbourhoods, local linear interpolation or data augmentation invariances, 2) The embedding of a particular manifold shall not collapse, and 3) The embedding of identified manifolds shall be linearized and separated, i.e. they occupy different linear subspaces. The team leverages data augmentation to satisfy the first principle, and applies the MCR2 objective to satisfy principles 2 and 3.

NMCE aims to assign each data point to its corresponding manifold (clustering) and to learn a coordinate for each manifold (manifold learning). To achieve this, the team uses a neural network to map a data point to the feature embedding and the cluster assignment, and leverages the MCR2 joint clustering and subspace learning algorithm to make clustering possible with the neural network.

To make the clusters learnable for a neural network, the team introduces explicit constraints for learning the manifold clustering and uses the MCR2 to provide a principled learning objective for learning a linear subspace-structured representation given the clustering.

The team performed evaluation experiments by clustering a mixture of manifold-structured data generated by adding Gaussian noise through two randomly initialized multi-layer perceptrons to enforce the locality constraint, and compared NMCE with leading subspace clustering methods on the COIL20 (Nene et al., 1996a) and COIL100 (Nene et al., 1996b) datasets.

The proposed NMCE achieved a 0.0 error rate on COIL20 and an error rate of 11.53 on COIL100, significantly surpassing the previous state-of-the-art results of 1.79 and 20.67, respectively. The results demonstrate the NMCE method’s ability to leverage DNNs’ non-linear processing capability for better manifold clustering and embedding. The researchers hope the work can also contribute to a deeper understanding of unsupervised representation learning.

The paper Neural Manifold Clustering and Embedding is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Yann LeCun Team’s Neural Manifold Clustering and Embedding Method Surpasses High-Dimensional Clustering Algorithm Benchmarks

Leave a Reply

Your email address will not be published.

%d bloggers like this: