GcForest, a decision tree ensemble approach that is much easier to train than deep neural networks, has received a lot of attention from researchers since it was introduced by Prof. Zhihua Zhou and his student Ji Feng last year. Based on their previous work, Zhou, Feng and Nanjing University colleague Yang Yu have now proposed Multi-layered Gradient Boosting Decision Trees (mGBDTs).
This “Deep Forest” framework is the first attempt at building a multi-layered model using tree ensembles. The authors explain: “Concretely, by introducing fine-grained scanning and cascading operations, the model is able to construct a multi-layered structure with adaptive model complexity and achieve competitive performance across a board range of tasks.”
The gcForest model leverages all strategies for diversity enhancement in ensemble learning. The approach however can only be adopted in a supervised learning setting, and fails to solve the problem of how to construct a multi-layered Forest model that examines its representation learning ability. However, as many researchers believe multi-layered hierarchical representations could be an essential ingredient for the improvement of deep neural networks, it’s likely more effort on representation learning will be done.
Zhou and his colleagues aim to make the best use of both the excellent performance of tree ensembles and the expressive power of hierarchical distributed representations. To solve gcForest’s remaining issues, they have introduced a multi-layered GBDT forest (mGBDTs), “with an explicit emphasis on exploring the ability to learn hierarchical representations by stacking several layers of regression GBDTs as its building block.”
In addition, mGBDTs are able to learn good representations from raw data in both supervised or unsupervised settings by constructing a hierarchical or “deep” structures, which is believed to be the key to their success.
The team’s research proves that trees can be used to obtain hierarchical representations, though they had been generally considered appropriate only for neural networks or differentiable systems. The effectiveness of this approach has been proven by theoretical justifications and experimental results.
Author: Jessie Geng | Editor: Michael Sarazen