Gradient boosting decision trees (GBDT) is a powerful machine learning algorithm widely used in real-life applications such as online advertising, search ranking, time-series prediction, etc. A GBDT drawback is that its training requires arithmetic operations of high-precision FP (floating point) numbers and the support of large datasets, which limits scalability in distributed systems.
In the new paper Quantized Training of Gradient Boosting Decision Trees, a team from Microsoft Research, DP Technology, and Tsinghua University shows that GBDTs can exploit low-precision training, proposing a quantized training framework with low-bitwidth integer arithmetics for efficient low-precision GBDT training.
GBDT is an ensemble learning algorithm used for solving both classification and regression prediction problems. It improves the learning process by simplifying the objectives and reducing the number of iterations required to obtain a sufficiently optimal solution. Typically, 32-bit floating point numbers are required during GBDT training for calculating the gradients and Hessians of the loss function, and 32-bit or 64-bit FP numbers are used for the accumulations in the histograms. The main computation cost in GBDT training is thus the arithmetic operations performed over gradients.
The researchers note several potential benefits of more efficient GBDT training: 1) speeding up the computation of gradient statistics in histograms; 2) compressing the communication cost of high-precision statistical information during distributed training; and (3) the inspiration of utilization and development of hardware architectures which well support low-precision computation for GBDT training.
The team proposes reducing GBDTs’ high-precision requirements via a quantized training algorithm over gradients. They quantize gradients into very low-precision (2-bit or 3-bit) integers such that the FP arithmetic operations can be replaced by integer arithmetic operations, reducing the overall computation burden.
The researchers also introduce two crucial approaches for preserving the accuracy of quantized training — stochastic gradient discretization and leaf value refitting — and demonstrate their effectiveness both empirically and theoretically, validating that the proposed method enables GBDTs to utilize low-precision computation resources to achieve training speedups without performance drops.
The team implemented their proposed system on both CPUs and GPUs in their empirical evaluations. The results show the method can speedup GBDT training under settings that include single process on CPUs, single process on a GPU, and distributed training over CPUs, indicating its flexibility across different types of computational resources. With quantized training, the team achieved up to 2x speedups compared with state-of-the-art GBDT tools on both CPUs and GPUs.
The team believes this is the first low-precision training algorithm proposed for GBDT, and their work shows that 2- or 3-bit gradients are sufficient for training with comparable accuracy. They hope their findings can lead to a better understanding of and additional improvements to the standard GBDT algorithm.
Author: Hecate He | Editor: Michael Sarazen, Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.