One way to understand the principles governing how deep learning models process data is to investigate input data points that have different amounts or types of example difficulty. Various definitions of example difficulty have been presented in previous studies: from a statistical point of view, example difficulty means the probability of predicting the ground truth label for an example; while in model learning, example difficulty refers to the difficulty involved in learning an example.
These two notions however share two fundamental limitations: they do not encapsulate the processing of data inside a converged model, and they cannot distinguish between examples that are difficult for different reasons.
In the paper Deep Learning Through the Lens of Example Difficulty, a Google Research team tackles these issues, proposing “prediction depth” determined from the hidden embeddings as a new measure of example difficulty. Their study reveals the surprising fact that the prediction depth of a given input has strong connections to a model’s uncertainty, confidence, accuracy and speed of learning for that data point.
The researchers use hidden layer probes to determine example difficulty. They first introduce a computational view of example difficulty parametrized by the prediction depth, then, based on this definition, show that prediction depth is both a meaningful and robust notion of example difficulty. They also provide detailed descriptions on how prediction depth can be used to better understand three important aspects of deep learning: accuracy and consistency of a prediction; the order in which data is learned; and the simplicity of the learned function (as measured by the margin) in the vicinity of a data point.
The team conducted an empirical analysis on various datasets to ensure that the results’ robustness to different architecture and dataset choices. The datasets used include ResNet18 (He et al., 2016), VGG16 (Simonyan and Zisserman, 2015); and MLP architectures trained on the CIFAR10, CIFAR100 (Krizhevsky et al., 2009), Fashion MNIST (FMNIST) (Xiao et al., 2017) and SVHN (Netzer et al., 2011) datasets. In the CIFAR10 in ResNet18 experiment, the proposed method increased accuracy from 25 percent to 98 percent for inputs that were the most “ambiguous without their label.”
The Google researchers summarize the study’s contributions as:
- Introduce a measure of computational example difficulty: the prediction depth (PD).
- Show that the prediction depth is larger for examples that visually appear to be more difficult, and that prediction depth is consistent between architectures and random seeds.
- An empirical investigation reveals that prediction depth appears to establish a linear lower bound on the consistency of a prediction. Show that predictions are on average more accurate for validation points with small prediction depths.
- Demonstrate that final predictions for data points that converge earlier during training are typically determined in earlier layers, which establishes a correspondence between the training history of the network and the processing of data in the hidden layers.
- Show that both the adversarial input margin and the output margin are larger for examples with smaller prediction depths. Design an intervention to reduce the output margin of a network and show that this leads to predictions being made only in the latest hidden layers.
- Identify three extreme forms of example difficulty by considering the prediction depth in the training and validation splits independently and demonstrate how a simple algorithm that uses the hidden embeddings in one middle layer to make predictions can lead to dramatic improvements in accuracy for inputs that strongly exhibit a specific form of example difficulty.
- Use the results to present a coherent picture of deep learning that unifies four seemingly unrelated deep learning phenomena: early layers generalize while later layers memorize, networks converge from input layer towards output layer, easy examples are learned first, and networks present simpler functions earlier in training.
Overall, the proposed prediction depth notion of example difficulty reveals what paper co-author Behnam Neyshabur calls “surprising relationships with different deep learning phenomena.” The Google team notes that their results stem from a deep model’s representation, which is hierarchical by construction, and that similar results will therefore likely appear in larger models, larger datasets, and tasks other than image classification — although further testing in these and other areas remains to be done.
The researchers says they hope their study can help in the development of models that capture heteroscedastic uncertainty, increase understanding of how deep networks respond to distributional shift, and advance curriculum learning approaches and machine learning fairness.
The paper Deep Learning Through the Lens of Example Difficulty is on arXiv.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.