Papers accepted to ICLR 2023

Four papers by the Yandex Research team and our collaborators have been accepted for publication at the International Conference on Learning Representations (ICLR 2023).

A critical look at the evaluation of GNNs under heterophily: Are we really making progress? by Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, Liudmila Prokhorenkova

It is often believed that standard Graph Neural Networks (GNNs) work well for node classification only on homophilous graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous and it is typically assumed that specialized methods are required to achieve strong performance on such graphs. In this work, we challenge this assumption. First, we show that the standard datasets used for evaluating heterophily-specific models have serious drawbacks, making results obtained by using them unreliable. Then, we propose a set of heterophilous graphs of varying properties that can serve as a better benchmark for evaluating the performance of GNNs under heterophily. We show that standard GNNs achieve strong results on these heterophilous graphs, almost always outperforming specialized models.

Understanding DDPM Latent Codes Through Optimal Transport by Valentin Khrulkov, Gleb Ryzhakov, Andrei Chertkov, Ivan Oseledets

Diffusion models naturally come with an encoder map induced by the probability flow ODE or, equivalently, DDIM sampler. However, the nature of the obtained latent codes still needs to be fully understood. In our work, we focus on analyzing this encoder map. We show that it is practically indistinguishable from the optimal transport map between data distribution and standard normal distribution (limiting distribution of the diffusion model). We validate this observation with high numerical precision on synthetic datasets and theoretically show that it holds for arbitrary multivariate normal distributions. Finally, we demonstrate how this behavior can be observed on practical DDPMs on image datasets.

Gradient Boosting Performs Gaussian Process Inference by Aleksei Ustimenko, Artem Beliakov, Liudmila Prokhorenkova

This paper shows that gradient boosting based on oblivious (symmetric) decision trees can be equivalently reformulated as a kernel method that converges to the solution of a certain Kernel Ridge Regression problem. Thus, we obtain the convergence to a Gaussian Process' posterior mean, which, in turn, allows us to transform gradient boosting into a sampler from the posterior easily. This leads to better knowledge uncertainty estimates and improved out-of-domain detection.

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta by Maksim Velikanov, Denis Kuznedelev, Dmitry Yarotsky

Mini-batch Stochastic Gradient Descent (SGD) with momentum is a fundamental algorithm for learning large predictive models. In this paper, we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches. Our key idea is to consider the dynamics of the second moments of model parameters for a special family of “Spectrally Expressible” approximations. This allows obtaining an explicit expression for the generating function of the sequence of loss values. By analyzing this generating function, we find, in particular, that the SGD dynamics exhibits several convergent and divergent regimes depending on the spectral distributions of the problem. Moreover, the convergent regimes admit explicit stability conditions, and explicit loss asymptotics in the case of power-law spectral distributions. Finally, the optimal convergence rate can be achieved at negative momenta. We verify our theoretical predictions by extensive experiments with MNIST, CIFAR10 and synthetic problems, and find a good quantitative agreement.