Over recent years, Machine Learning (ML) has focused significantly on developing cost-effective optimization routines. Zeroth-order optimization minimizes the information available to the learner, relying solely on loss function values. However, an underexplored question remains: which loss functions can be efficiently optimized with boosting, and what information is essential for boosting to meet the original blueprint’s requirements?
In a new paper How to Boost Any Loss Function, a Google research team provides a constructive, formal answer, demonstrating that any loss function can be optimized with boosting.
Boosting, a highly effective ML-born optimization method, requires learning arbitrarily good models computationally efficiently. This method ensures classifiers perform at least slightly better than random guessing. Boosting has quickly evolved into an optimization setting involving first-order information about the loss being optimized, similar to the widely used gradient descent.
However, this new research shows that boosting can be applied to essentially any loss function without needing first-order information. From this perspective, boosting is viewed more favorably than recent developments in zeroth-order optimization because it achieves convergence without requiring the loss to satisfy the typical assumptions those analyses rely on.
To ensure the weak learning assumption holds for efficient implementation, specific design choices in boosting are necessary. Contrary to this setting, the team introduces SecBoost, which identifies two additional areas where design choices can help maintain assumptions over more iterations: handling local minima and dealing with losses that have constant values over parts of their domain.
SecBoost is effective for any loss function whose set of discontinuities has zero Lebesgue measure. Their findings position boosting more favorably than recent advancements in zeroth-order optimization, as boosting-compliant convergence does not require the loss to meet the usual assumptions.
Overall, this work provides a constructive formal answer, showing that any loss function can be optimized with boosting, achieving a feat not yet possible in the classical zeroth-order setting.
The paper How to Boost Any Loss Function is on arXiv.
Author: Hecate He | Editor: Chain Zhang

