Publications

Explore our scientific papers on fundamental problems in machine learning
5 of 241 publications
  • YaART: Yet Another ART Rendering Technology

    Computer visionGenerative models
    Sergey Kastryulin
    Artem Konev
    Alexander Shishenya
    Eugene Lyapustin
    Artem Khurshudov
    Alexander Tselousov
    Nikita Vinokurov
    Denis Kuznedelev
    Alexander Markovich
    Grigoriy Livshits
    Alexey Kirillov
    Anastasiia Tabisheva
    Liubov Chubarova
    Marina Kaminskaia
    Alexander Ustyuzhanin
    Artemii Shvetsov
    Daniil Shlenskii
    Valerii Startsev
    Dmitrii Kornilov
    Mikhail Romanov
    Dmitry Baranchuk
    Artem Babenko
    Sergei Ovcharenko
    Valentin Khrulkov
    KDD, 2025

    In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models.

  • Measuring Diversity: Axioms and Challenges

    Machine learning theory
    Mikhail Mironov
    Liudmila Prokhorenkova
    ICML, 2025

    This paper addresses the problem of quantifying diversity for a set of objects. First, we conduct a systematic review of existing diversity measures and explore their undesirable behavior in certain cases. Based on this review, we formulate three desirable properties (axioms) of a reliable diversity measure: monotonicity, uniqueness, and continuity. We show that none of the existing measures has all three properties and thus these measures are not suitable for quantifying diversity. Then, we construct two examples of measures that have all the desirable properties, thus proving that the list of axioms is not self-contradictory. Unfortunately, the constructed examples are too computationally expensive (NP-hard) for practical use. Thus, we pose an open problem of constructing a diversity measure that has all the listed properties and can be computed in practice or proving that all such measures are NP-hard to compute.

  • Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

    Natural language processing Large-scale machine learning
    Alina Shutova
    Vladimir Malinovskii
    Vage Egiazarian
    Denis Kuznedelev
    Denis Mazur
    Nikita Surkov
    Ivan Ermakov
    Dan Alistarh
    ICML, 2025

    Efficient real-world deployments of large language models (LLMs) rely on Key-Value (KV) caching for processing and generating long outputs, reducing the need for repetitive computation. For large contexts, Key-Value caches can take up tens of gigabytes of device memory, as they store vector representations for each token and layer. Recent work has shown that the cached vectors can be compressed through quantization, pruning or merging, but these techniques often compromise quality towards higher compression rates. In this work, we aim to improve Key & Value compression by exploiting two observations: 1) the inherent dependencies between keys and values across different layers, and 2) the existence of high-compression methods for internal network states (e.g. attention Keys & Values). We propose AQUA-KV, an adaptive quantization for Key-Value caches that relies on compact adapters to exploit existing dependencies between Keys and Values, and aims to “optimally” compress the information that cannot be predicted. AQUA-KV significantly improves compression rates, while maintaining high accuracy on state-of-the-art LLM families. On Llama 3.2 LLMs, we achieve near-lossless inference at 2-2.5 bits per value with under 1 relative error in perplexity and LongBench scores. AQUA-KV is one-shot, simple, and efficient: it can be calibrated on a single GPU within 1-6 hours, even for 70B models.

  • FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

    Natural language processing Machine learning theoryOptimization
    Philip Zmushko
    Aleksandr Beznosikov
    Martin Takáč
    Samuel Horváth
    ICML, 2025

    With the increase in the number of parameters in large language models, the training process increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA), low-rank gradient projection (GaLore), and blockwise optimization (BAdam) have been proposed. However, in all these algorithms, the effective rank of the weight updates remains low-rank, which can lead to a substantial loss of information from the gradient. This loss can be critically important, especially during the pre-training stage. In this paper, we introduce FRUGAL (Full-Rank Updates with GrAdient spLitting), a new memory-efficient optimization framework. FRUGAL leverages gradient splitting to perform low-dimensional updates using advanced algorithms (such as Adam), while updates along the remaining directions are executed via state-free methods like SGD or signSGD. Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam. We provide theoretical convergence guarantees for our framework when using SGDM for low-dimensional updates and SGD for state-free updates. Additionally, our method consistently outperforms concurrent approaches, achieving state-of-the-art results in pre-training and fine-tuning tasks while balancing memory efficiency and performance metrics.

  • Inverse Bridge Matching Distillation

    Computer visionGenerative models
    Nikita Gushchin
    David Li
    Daniil Selikhanovych
    Evgeny Burnaev
    Dmitry Baranchuk
    Alexander Korotin
    ICML, 2025

    Learning diffusion bridge models is easy; making them fast and practical is an art. Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-to-image translation. However, like many modern diffusion and flow models, DBMs suffer from the problem of slow inference. To address it, we propose a novel distillation technique based on the inverse bridge matching formulation and derive the tractable objective to solve it in practice. Unlike previously developed DBM distillation techniques, the proposed method can distill both conditional and unconditional types of DBMs, distill models in a one-step generator, and use only the corrupted images for training. We evaluate our approach for both conditional and unconditional types of bridge matching on a wide set of setups, including super-resolution, JPEG restoration, sketch-to-image, and other tasks, and show that our distillation technique allows us to accelerate the inference of DBMs from 4x to 100x and even provide better generation quality than used teacher model depending on particular setup.

Filter by:

38
33
32
22
17
17
16
15
14
13
9
9
9
7
6
6
5
2
2
46
33
27
24
22
18
14
7
7
7
6
5
4
4
2
2
2
1
1
1
1
1
1
1
1
1
1
1
11
18
17
11
28
20
16
9
11
15
21
18
21
16
3
1
1
2
1
1