Gradient boosting

Gradient boosting iteratively combines weak learners (usually decision trees) to create a stronger model. It achieves state-of-the-art results on tabular data with heterogeneous features.

Area 5. Gradient boosting.svg



  • Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting

    OptimizationMachine learning theoryGradient boosting
    Aleksei Ustimenko
    Aleksandr Beznosikov
    ICLR, 2024

    In this work, we consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Stochastic Differential Equation. The chain we study is a unified framework for theoretical analysis. It comes with almost arbitrary isotropic and state-dependent noise instead of normal and state-independent one as in most related papers. Moreover, in our chain the drift and diffusion coefficient can be inexact in order to cover wide range of applications as Stochastic Gradient Langevin Dynamics, sampling, Stochastic Gradient Descent or Stochastic Gradient Boosting. We prove the bound in W2-distance between the laws of our Ito chain and corresponding differential equation. These results improve or cover most of the known estimates. And for some particular cases, our analysis is the first.

  • Which Tricks Are Important for Learning to Rank?

    RankingGradient boosting
    Ivan Lyzhin
    Aleksei Ustimenko
    Andrey Gulin
    Liudmila Prokhorenkova
    ICML, 2023

    Nowadays, state-of-the-art learning-to-rank methods are based on gradient-boosted decision trees (GBDT). The most well-known algorithm is LambdaMART which was proposed more than a decade ago. Recently, several other GBDT-based ranking algorithms were proposed. In this paper, we thoroughly analyze these methods in a unified setup. In particular, we address the following questions. Is direct optimization of a smoothed ranking loss preferable over optimizing a convex surrogate? How to properly construct and smooth surrogate ranking losses? To address these questions, we compare LambdaMART with YetiRank and StochasticRank methods and their modifications. We also propose a simple improvement of the YetiRank approach that allows for optimizing specific ranking loss functions. As a result, we gain insights into learning-to-rank techniques and obtain a new state-of-the-art algorithm.

  • Gradient Boosting Performs Gaussian Process Inference

    Machine learning theoryUncertainty estimation Gradient boosting
    Aleksei Ustimenko
    Artem Beliakov
    Liudmila Prokhorenkova
    ICLR, 2023

    This paper shows that gradient boosting based on symmetric decision trees can be equivalently reformulated as a kernel method that converges to the solution of a certain Kernel Ridge Regression problem. Thus, we obtain the convergence to a Gaussian Process' posterior mean, which, in turn, allows us to easily transform gradient boosting into a sampler from the posterior to provide better knowledge uncertainty estimates through Monte-Carlo estimation of the posterior variance. We show that the proposed sampler allows for better knowledge uncertainty estimates leading to improved out-of-domain detection.