Deep neural networks (DNNs) have advanced the state-of-the-art on tasks ranging from image classification to language processing and gameplay. But as models have become deeper and more complex, understanding their behaviours has become more challenging. A case in point is an intriguing empirical phenomenon called Neural Collapse, first identified by Papyan et al. in 2020.
In the new paper Neural Collapse: A Review on Modelling Principles and Generalization, researchers from New York University analyze the principles of Neural Collapse (NC) and present a thought model designed to explain the effect of variance collapse, aiming at insights on and a better understanding of the generalization capabilities of DNNs.
The team summarizes their main contributions as:
- We analyze NC modelling techniques by unifying them under a common set of principles, which we believe is missing in the literature.
- We analyze the role of NC on our understanding of generalization by reviewing the test collapse-based metrics under a common lens.
- We present a thought model which attempts to explain the effect of variance collapse on transfer learning from the viewpoint of particle interactions according to the inverse-square law.
The modern training paradigm for DNNs involves training well beyond the zero error threshold and toward zero loss. This post-zero-error phase is called the Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes. During TPT, the training error stays effectively zero while the training loss is pushed to zero.
The TPT however is exposed to a pervasive inductive bias, NC, which involves four deeply interconnected phenomena:
- NC1 – Collapse of variability: For data samples belonging to the same class, their final hidden layer (i.e. the penultimate layer) features concentrate around their class mean.
- NC2 – Preference towards a simplex equiangular tight frame: The class means of the penultimate layer that now represents each class tend to form a simplex equiangular tight frame (simplex ETF).
- NC3 – Self-dual alignment: The vectors/columns of the last layer linear classifier matrix also form a simplex ETF in their dual vector space and converge to the simplex ETF (up to rescaling) of the penultimate layer features.
- NC4 – Choose the nearest class mean: When a test point is to be classified, the last layer classifier now essentially acts as a nearest (train)-class mean decision rule.
The team uses a principled approach to review the NC phenomena, first confirming that the final layer classifiers in DNNs tend to fall into a simple symmetric structure that helps the models obtain their high performance and state-of-the-art results. In an effort to capture the essence of the NC phenomena, the researchers then analyze such models from the ground up and unify them under a common set of principles.
Overall, the paper provides a solid overview of current efforts to explain NC, It also probes the implications of NC on generalization and transfer learning via a thought model that explains the effects of variance collapse on transfer learning based on the inverse-square law to provide additional insights on the generalization capabilities of DNNs.
The team hopes their work’s analytical results will be of interest to the deep learning community and encourage future research in this area.
The paper Neural Collapse: A Review on Modelling Principles and Generalization is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.