In this work, we consider rather general and broad class of Markov chains, Ito chains, that look like Euler-Maryama discretization of some Stochastic Differential Equation. The chain we study is a unified framework for theoretical analysis. It comes with almost arbitrary isotropic and state-dependent noise instead of normal and state-independent one as in most related papers. Moreover, in our chain the drift and diffusion coefficient can be inexact in order to cover wide range of applications as Stochastic Gradient Langevin Dynamics, sampling, Stochastic Gradient Descent or Stochastic Gradient Boosting. We prove the bound in W2-distance between the laws of our Ito chain and corresponding differential equation. These results improve or cover most of the known estimates. And for some particular cases, our analysis is the first.

Aleksei Ustimenko

Aleksandr Beznosikov

Most machine learning algorithms build an optimization model and learn its parameters from the given data. Thus, developing effective and efficient optimization methods is of the essence.

Optimization

We study various aspects related to theoretical understanding of ML models and algorithms.

Machine learning theory

Gradient boosting iteratively combines weak learners (usually decision trees) to create a stronger model. It achieves state-of-the-art results on tabular data with heterogeneous features.

Ito Diffusion Approximation of Universal Ito Chains for Sampling, Optimization and Boosting