Valentin Khrulkov

Understanding DDPM Latent Codes Through Optimal Transport
Machine learning theory
Valentin Khrulkov
Gleb Ryzhakov
Andrei Chertkov
Ivan Oseledets
ICLR, 2023
Diffusion models have recently outperformed alternative approaches to model the distribution of natural images. Such diffusion models allow for deterministic sampling via the probability flow ODE, giving rise to a latent space and an encoder map. While having important practical applications, such as the estimation of the likelihood, the theoretical properties of this map are not yet fully understood. In the present work, we partially address this question for the popular case of the VP-SDE (DDPM) approach. We show that, perhaps surprisingly, the DDPM encoder map coincides with the optimal transport map for common distributions; we support this claim by extensive numerical experiments using advanced tensor train solver for multidimensional Fokker-Planck equation. We provide additional theoretical evidence for the case of multivariate normal distributions.
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning
Computer vision Ranking
Aleksandr Ermolov
Leyla Mirvakhabova
Valentin Khrulkov
Nicu Sebe
Ivan Oseledets
CVPR, 2022
Metric learning aims to learn a highly discriminative model encouraging the embeddings of similar classes to be close in the chosen metrics and pushed apart for dissimilar ones. The common recipe is to use an encoder to extract embeddings and a distance-based loss function to match the representations – usually, the Euclidean distance is utilized. An emerging interest in learning hyperbolic data embeddings suggests that hyperbolic geometry can be beneficial for natural data. Following this line of work, we propose a new hyperbolic-based model for metric learning. At the core of our method is a vision transformer with output embeddings mapped to hyperbolic space. These embeddings are directly optimized using modified pairwise cross-entropy loss. We evaluate the proposed model with six different formulations on four datasets achieving the new state-of-the-art performance. The source code is available at https://github.com/htdt/hyp_metric
Label-Efficient Semantic Segmentation with Diffusion Models
Computer vision
Dmitry Baranchuk
Ivan Rubachev
Andrey Voynov
Valentin Khrulkov
Artem Babenko
ICLR, 2022
Denoising diffusion probabilistic models have recently received much research attention since they outperform alternative approaches, such as GANs, and currently provide state-of-the-art generative performance. The superior performance of diffusion models has made them an appealing tool in several applications, including inpainting, super-resolution, and semantic editing. In this paper, we demonstrate that diffusion models can also serve as an instrument for semantic segmentation, especially in the setup when labeled data is scarce. In particular, for several pretrained diffusion models, we investigate the intermediate activations from the networks that perform the Markov step of the reverse diffusion process. We show that these activations effectively capture the semantic information from an input image and appear to be excellent pixel-level representations for the segmentation problem. Based on these observations, we describe a simple segmentation method, which can work even if only a few training images are provided. Our approach significantly outperforms the existing alternatives on several datasets for the same amount of human supervision.

Valentin Khrulkov

Publications

Understanding DDPM Latent Codes Through Optimal Transport

Hyperbolic Vision Transformers: Combining Improvements in Metric Learning

Label-Efficient Semantic Segmentation with Diffusion Models

Posts

Functional Space Analysis of Local GAN Convergence