Gradient boosting

Gradient boosting iteratively combines weak learners (usually decision trees) to create a stronger model. It achieves state-of-the-art results on tabular data with heterogeneous features.

Area 5. Gradient boosting.svg



  • Which Tricks Are Important for Learning to Rank?

    RankingGradient boosting
    Ivan Lyzhin
    Aleksei Ustimenko
    Andrey Gulin
    Liudmila Prokhorenkova
    ICML, 2023

    Nowadays, state-of-the-art learning-to-rank methods are based on gradient-boosted decision trees (GBDT). The most well-known algorithm is LambdaMART which was proposed more than a decade ago. Recently, several other GBDT-based ranking algorithms were proposed. In this paper, we thoroughly analyze these methods in a unified setup. In particular, we address the following questions. Is direct optimization of a smoothed ranking loss preferable over optimizing a convex surrogate? How to properly construct and smooth surrogate ranking losses? To address these questions, we compare LambdaMART with YetiRank and StochasticRank methods and their modifications. We also propose a simple improvement of the YetiRank approach that allows for optimizing specific ranking loss functions. As a result, we gain insights into learning-to-rank techniques and obtain a new state-of-the-art algorithm.

  • Gradient Boosting Performs Gaussian Process Inference

    Machine learning theoryUncertainty estimation Gradient boosting
    Aleksei Ustimenko
    Artem Beliakov
    Liudmila Prokhorenkova
    ICLR, 2023

    This paper shows that gradient boosting based on symmetric decision trees can be equivalently reformulated as a kernel method that converges to the solution of a certain Kernel Ridge Regression problem. Thus, we obtain the convergence to a Gaussian Process' posterior mean, which, in turn, allows us to easily transform gradient boosting into a sampler from the posterior to provide better knowledge uncertainty estimates through Monte-Carlo estimation of the posterior variance. We show that the proposed sampler allows for better knowledge uncertainty estimates leading to improved out-of-domain detection.

  • SGLB: Stochastic Gradient Langevin Boosting

    Machine learning theoryGradient boosting
    Aleksei Ustimenko
    Liudmila Prokhorenkova
    ICML, 2021

    This paper introduces Stochastic Gradient Langevin Boosting (SGLB) - a powerful and efficient machine learning framework that may deal with a wide range of loss functions and has provable generalization guarantees. The method is based on a special form of the Langevin diffusion equation specifically designed for gradient boosting. This allows us to theoretically guarantee the global convergence even for multimodal loss functions, while standard gradient boosting algorithms can guarantee only local optimum. We also empirically show that SGLB outperforms classic gradient boosting when applied to classification tasks with 0-1 loss function, which is known to be multimodal.