DB

Dmitry Baranchuk

Publications

  • Alchemist: Turning Public Text-to-Image Data into Generative Gold

    Generative modelsComputer vision
    Valerii Startsev
    Alexander Ustyuzhanin
    Alexey Kirillov
    Dmitry Baranchuk
    Sergey Kastryulin
    NeurIPS Datasets and Benchmarks, 2025

    Pre-training equips text-to-image (T2I) models with broad world knowledge, but this alone is often insufficient to achieve high aesthetic quality and alignment. Consequently, supervised fine-tuning (SFT) is crucial for further refinement. However, its effectiveness highly depends on the quality of the fine-tuning dataset. Existing public SFT datasets frequently target narrow domains (e.g., anime or specific art styles), and the creation of high-quality, general-purpose SFT datasets remains a significant challenge. Current curation methods are often costly and struggle to identify truly impactful samples. This challenge is further complicated by the scarcity of public general-purpose datasets, as leading models often rely on large, proprietary, and poorly documented internal data, hindering broader research progress. This paper introduces a novel methodology for creating general-purpose SFT datasets by leveraging a pre-trained generative model as an estimator of high-impact training samples. We apply this methodology to construct and release Alchemist, a compact (3,350 samples) yet highly effective SFT dataset. Experiments demonstrate that Alchemist substantially improves the generative quality of five public T2I models while preserving diversity and style. Additionally, we release the fine-tuned models’ weights to the public.

  • Results of the Big ANN: NeurIPS’23 competition

    Nearest neighbor searchComputer vision
    Harsha Vardhan Simhadri
    Martin Aumüller
    Dmitry Baranchuk
    Matthijs Douze
    Edo Liberty
    Amir Ingber
    Frank Liu
    George Williams
    Ben Landrum
    Magdalen Dobson Manohar
    Mazin Karjikar
    Laxman Dhulipala
    Meng Chen
    Yue Chen
    Rui Ma
    Kai Zhang
    Yuzheng Cai
    Jiayang Shi
    Yizhuo Chen
    Weiguo Zheng
    Zihao Wang
    Jie Yin
    Ben Huang
    NeurIPS Datasets and Benchmarks, 2025

    The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect its the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search (Simhadri et al., NeurIPS 2021), this competition addressed sparse, filtered, out-of-distribution, and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources. The results showcased significant improvements in search accuracy and efficiency, with notable contributions from both academic and industrial teams. This paper summarizes the competition tracks, datasets, evaluation metrics, and the innovative approaches of the top-performing submissions, providing insights into the current advancements and future directions in the field of approximate nearest neighbor search.

  • YaART: Yet Another ART Rendering Technology

    Computer visionGenerative models
    Sergey Kastryulin
    Artem Konev
    Alexander Shishenya
    Eugene Lyapustin
    Artem Khurshudov
    Alexander Tselousov
    Nikita Vinokurov
    Denis Kuznedelev
    Alexander Markovich
    Grigoriy Livshits
    Alexey Kirillov
    Anastasiia Tabisheva
    Liubov Chubarova
    Marina Kaminskaia
    Alexander Ustyuzhanin
    Artemii Shvetsov
    Daniil Shlenskii
    Valerii Startsev
    Dmitrii Kornilov
    Mikhail Romanov
    Dmitry Baranchuk
    Artem Babenko
    Sergei Ovcharenko
    Valentin Khrulkov
    KDD, 2025

    In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models.

Posts

Datasets

  • Text-to-Image dataset for billion-scale similarity search

    Computer visionNatural language processing Nearest neighbor search
    Dmitry Baranchuk
    Artem Babenko

    Yandex Text-to-Image (T2I) dataset is collected to foster the research in billion-scale nearest neighbor search (NNS) when query distribution differs from the indexing one. In particular, this dataset addresses the cross-domain setting: a user specifies a textual query and requests the search engine to retrieve the most relevant images to the query. Notably, current large-scale indexing methods perform poorly in this setting. Therefore, novel highly-performant indexing solutions robust to out-of-domain queries are in high demand.

    The dataset represents a snapshot of the Yandex visual search engine and contains 1 billion 200-dimensional image embeddings for indexing. The image embeddings are produced by the Se-ResNext-101 model. The embeddings for textual queries are extracted by a variant of the DSSM model.

    Read more about the data format and how to download the dataset in the related post.