Neural network training is based on a loss function, which can be any differentiable function that maps predictions and actual values at scale. When training a classification model, researchers typically turn to either cross-entropy or focal loss. But is there a way to design more flexible loss functions that can be tailored to different tasks and datasets?
“Yes,” says a team from Waymo and Google. In their new paper A Polynomial Expansion Perspective of Classification Loss Functions, the researchers introduce PolyLoss, a novel and simple framework that redesigns loss functions as a linear combination of polynomial functions that can be easily tailored to different target tasks and datasets.
The team summarizes their main contributions as:
- Insights on common losses: We propose a unified framework, named PolyLoss, to rethink and redesign loss functions. This framework helps to explain cross-entropy loss and focal loss as two special cases of the PolyLoss family (by horizontally shifting polynomial coefficients), which was not recognized before. This new finding motivates us to investigate new loss functions that vertically adjust polynomial coefficients.
- New loss formulation: We evaluate different ways of vertically manipulating polynomial coefficients to simplify the hyperparameter search space. We propose a simple and effective Poly-1 loss formulation that only introduces one hyperparameter and one line of code.
- New findings: We identify that focal loss, though effective for many detection tasks, is suboptimal for the imbalanced ImageNet-21K. We find the leading polynomial contributes to a large portion of the gradient during training, and its coefficient correlates to the prediction confidence Pt. In addition, we provide an intuitive explanation of how to leverage this correlation to design good PolyLoss tailored to imbalanced datasets.
- Extensive experiments: We evaluate our PolyLoss on different tasks, models, and datasets. Results show PolyLoss consistently improves the performance on all fronts, which includes the state-of-the-art classifiers EfficientNetV2 and detectors RSN.
Neural networks are typically trained using either a cross-entropy loss function, which measures two probability distributions for a given random variable or set of events to adjust model weights during training; or a focal loss function, which applies a modulating term to the cross-entropy loss to address class imbalance for detection tasks.
PolyLoss provides a framework for analyzing, understanding and improving cross-entropy and focal loss. Inspired by the Taylor expansion method for function approximation via polynomials, PolyLoss views a loss function as a linear combination of polynomial functions. The team explains that polynomial terms in the gradient expansion capture different sensitivity concerning the model’s prediction probability of the target ground-truth class, while the constant gradient term causes the model to emphasize the majority class. They show that cross-entropy loss and focal loss correspond to different polynomial coefficients, where focal loss horizontally shifts the polynomial coefficients of cross-entropy loss.
The researchers propose a final loss formulation, Poly-1, and explore how vertically adjusting polynomial coefficients may affect training. They compare their approach to common cross-entropy loss and focal loss functions on various tasks and different datasets, where Poly-1 consistently outperforms conventional loss functions at the cost of a simple grid search and leads to maximal gains with only minimal code changes and hyperparameter tuning.
The proposed PolyLoss framework offers a novel and flexible approach for changing loss function shape by adjusting the polynomial coefficients and demonstrates that simply adjusting the leading polynomial coefficient can produce impressive improvements across a variety of models on multiple tasks and datasets. The researchers hope their findings can motivate additional exploration on loss function design beyond cross-entropy and focal loss.
The paper PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions is on OpenReview.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.