AI Machine Learning & Data Science Research

Predicting Downstream Model Performance at Early Training Stages: A New Perspective on Neural Network Selection via Edge Dynamics

A research team from Rensselaer Polytechnic Institute, Thomas J. Watson Research Center and the University of California, Los Angeles proposes a novel framework for effective pretrained neural network model selection for downstream tasks that forecasts the predictive ability of a model with its cumulative information in the early phase of neural network training.

Fine-tuning pretrained large-scale deep neural networks (NN) for downstream tasks has become the status quo in the deep learning community. A challenge facing researchers is how to efficiently select the most appropriate pretrained model for a given downstream task, as this process typically entails expensive computational costs in model training for performance prediction.

In the new paper Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics, a research team from Rensselaer Polytechnic Institute, Thomas J. Watson Research Center and the University of California, Los Angeles proposes a novel framework for effective NN selection for downstream tasks. The method is designed to forecast the predictive ability of a model with its cumulative information, and to save resources by doing so in the early phase of NN training.

The team summarizes their contributions as:

  1. View NN training as a dynamical system over synaptic connections, and first time investigate the interactions of synaptic connections in a microscopic perspective.
  2. Propose neural capacitance metric βeff for neural network model selection.
  3. Empirical results of 17 pre-trained models on five benchmark datasets show that our βeff based approach outperforms current learning curve prediction approaches.
  4. For rank prediction according to the performance of pre-trained models, our approach improves by 9.1/38.3/12.4/65.3/40.1% on CIFAR10/CIFAR100/SVHN/Fashion MNIST/Birds over the best baseline with observations from learning curves of length only 5 epochs.

The proposed framework is based on the idea that backpropagation during NN training is equivalent to the dynamical evolution of synaptic connections (edges) and that a converged neural network is associated with an equilibrium state of a networked system composed of these edges. It also draws on previous studies showing that complex real-world systems (such as plant-pollinator interactions and the spread of COVID-19) can be expressed with graph networks.

The researchers consider NN training as a dynamical system over synaptic connections and explore these synaptic connection interactions in a microscopic manner for the first time. They propose “βeff” as a universal neural capacitance metric for characterizing both biological and artificial NNs, and build a line graph for trainable weights and reformulate the training dynamics in the same form as the general dynamics to enable a βeff property to predict the final accuracy of a graph neural network with only a few observations during the early phase of the training procedure.

The team evaluated their framework on 17 pretrained ImageNet models including AlexNet, VGGs (VGG16/19), ResNets (ResNet50/50V2/101/101V2/152/152V2), DenseNets (DenseNet121/169/201), MobileNets (MobileNet and MobileNetV2), Inceptions (InceptionV3, InceptionResNetV2) and Xception. They also compared the βeff-based approach with various other model ranking baselines.

In the experiments, the neural capacitance βeff-based approach outperformed current learning curve prediction approaches and achieved significant relative improvements over the best baselines on the CIFAR10/CIFAR100, SVHN, Fashion MNIST, and Birds datasets.

The results validate βeff as an effective metric for predicting the ranking of a set of pretrained models based on early training results. The team plans to explore additional related research directions in the future, such as simplifying the adjacency matrix P to capture the dependency and mutual interaction between synaptic connections, extending the framework to NAS benchmarks to select the best subnetwork, and designing an efficient algorithm to directly optimize NN architectures based on βeff.

The paper Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Predicting Downstream Model Performance at Early Training Stages: A New Perspective on Neural Network Selection via Edge Dynamics

Leave a Reply

Your email address will not be published. Required fields are marked *