Publications

Explore our scientific papers on fundamental problems in machine learning
5 of 246 publications
  • Results of the Big ANN: NeurIPS’23 competition

    Computer visionNearest neighbor search
    Harsha Vardhan Simhadri
    Martin Aumüller
    Dmitry Baranchuk
    Matthijs Douze
    Edo Liberty
    Amir Ingber
    Frank Liu
    George Williams
    Ben Landrum
    Magdalen Dobson Manohar
    Mazin Karjikar
    Laxman Dhulipala
    Meng Chen
    Yue Chen
    Rui Ma
    Kai Zhang
    Yuzheng Cai
    Jiayang Shi
    Yizhuo Chen
    Weiguo Zheng
    Zihao Wang
    Jie Yin
    Ben Huang
    NeurIPS Datasets and Benchmarks, 2025

    The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect its the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search (Simhadri et al., NeurIPS 2021), this competition addressed sparse, filtered, out-of-distribution, and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources. The results showcased significant improvements in search accuracy and efficiency, with notable contributions from both academic and industrial teams. This paper summarizes the competition tracks, datasets, evaluation metrics, and the innovative approaches of the top-performing submissions, providing insights into the current advancements and future directions in the field of approximate nearest neighbor search.

  • Alchemist: Turning Public Text-to-Image Data into Generative Gold

    Computer visionGenerative models
    Valerii Startsev
    Alexander Ustyuzhanin
    Alexey Kirillov
    Dmitry Baranchuk
    Sergey Kastryulin
    NeurIPS Datasets and Benchmarks, 2025

    Pre-training equips text-to-image (T2I) models with broad world knowledge, but this alone is often insufficient to achieve high aesthetic quality and alignment. Consequently, supervised fine-tuning (SFT) is crucial for further refinement. However, its effectiveness highly depends on the quality of the fine-tuning dataset. Existing public SFT datasets frequently target narrow domains (e.g., anime or specific art styles), and the creation of high-quality, general-purpose SFT datasets remains a significant challenge. Current curation methods are often costly and struggle to identify truly impactful samples. This challenge is further complicated by the scarcity of public general-purpose datasets, as leading models often rely on large, proprietary, and poorly documented internal data, hindering broader research progress. This paper introduces a novel methodology for creating general-purpose SFT datasets by leveraging a pre-trained generative model as an estimator of high-impact training samples. We apply this methodology to construct and release Alchemist, a compact (3,350 samples) yet highly effective SFT dataset. Experiments demonstrate that Alchemist substantially improves the generative quality of five public T2I models while preserving diversity and style. Additionally, we release the fine-tuned models’ weights to the public.

  • GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data

    Graph machine learning
    Gleb Bazhenov
    Oleg Platonov
    Liudmila Prokhorenkova
    NeurIPS Datasets and Benchmarks, 2025

    Although data that can be naturally represented as graphs is widespread in real-world applications across diverse industries, popular graph ML benchmarks for node property prediction only cover a surprisingly narrow set of data domains, and graph neural networks (GNNs) are often evaluated on just a few academic citation networks. This issue is particularly pressing in light of the recent growing interest in designing graph foundation models. These models are supposed to be able to transfer to diverse graph datasets from different domains, and yet the proposed graph foundation models are often evaluated on a very limited set of datasets from narrow applications. To alleviate this issue, we introduce GraphLand: a benchmark of 14 diverse graph datasets for node property prediction from a range of different industrial applications. GraphLand allows evaluating graph ML models on a wide range of graphs with diverse sizes, structural characteristics, and feature sets, all in a unified setting. Further, GraphLand allows investigating such previously underexplored research questions as how realistic temporal distributional shifts under transductive and inductive settings influence graph ML model performance. To mimic realistic industrial settings, we use GraphLand to compare GNNs with gradient-boosted decision trees (GBDT) models that are popular in industrial applications and show that GBDTs provided with additional graph-based input features can sometimes be very strong baselines. Further, we evaluate currently available general-purpose graph foundation models and find that they fail to produce competitive results on our proposed datasets.

  • AutoJudge: Judge Decoding Without Manual Annotation

    Natural language processing Large-scale machine learningSpeculative and parallel decoding
    Roman Garipov
    Fedor Velikonivtsev
    Ivan Ermakov
    Ruslan Svirschevski
    Vage Egiazarian
    Max Ryabinin
    NeurIPS, 2025

    We introduce AutoJudge, a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify which of the generated tokens affect the downstream quality of the response, relaxing the distribution match guarantee so that the "unimportant" tokens can be generated faster. Our approach relies on a semi-greedy search algorithm to test which of the mismatches between target and draft models should be corrected to preserve quality and which ones may be skipped. We then train a lightweight classifier based on existing LLM embeddings to predict, at inference time, which mismatching tokens can be safely accepted without compromising the final answer quality. We evaluate the effectiveness of AutoJudge with multiple draft/target model pairs on mathematical reasoning and programming benchmarks, achieving significant speedups at the cost of a minor accuracy reduction. Notably, on GSM8k with the Llama 3.1 70B target model, our approach achieves up to ≈2× speedup over speculative decoding at the cost of ≤1% drop in accuracy. When applied to the LiveCodeBench benchmark, AutoJudge automatically detects programming-specific important tokens, accepting ≥25 tokens per speculation cycle at 2% drop in Pass@1. Our approach requires no human annotation and is easy to integrate with modern LLM inference frameworks.

  • Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

    Natural language processing Speculative and parallel decoding
    Gleb Rodionov
    Roman Garipov
    Alina Shutova
    George Yakushev
    Erik Schultheis
    Vage Egiazarian
    Anton Sinitsin
    Denis Kuznedelev
    Dan Alistarh
    NeurIPS, 2025

    Large Language Models (LLMs) have demonstrated the ability to tackle increasingly complex tasks through advanced reasoning, long-form content generation, and tool use. Solving these tasks often involves long inference-time computations. In human problem solving, a common strategy to expedite work is collaboration: by dividing the problem into sub-tasks, exploring different strategies concurrently, etc. Recent research has shown that LLMs can also operate in parallel by implementing explicit cooperation frameworks, such as voting mechanisms or the explicit creation of independent sub-tasks that can be executed in parallel. However, each of these frameworks may not be suitable for all types of tasks, which can hinder their applicability. In this work, we propose a different design approach: we run LLM “workers” in parallel , allowing them to synchronize via a concurrently-updated attention cache and prompt these workers to decide how best to collaborate. Our approach allows the LLM instances to come up with their own collaboration strategy for the problem at hand, all the while “seeing” each other’s memory in the concurrent KV cache. We implement this approach via Hogwild! Inference: a parallel LLM inference engine where multiple instances of the same LLM run in parallel with the same attention cache, with “instant” access to each other’s memory. Hogwild! Inference takes advantage of Rotary Position Embeddings (RoPE) to avoid recomputation while improving parallel hardware utilization. We find that modern reasoning-capable LLMs can perform inference with shared Key-Value cache out of the box, without additional fine-tuning.

Filter by:

40
34
33
22
18
17
17
16
15
13
9
9
9
7
6
6
5
4
2
48
33
27
24
22
18
14
7
7
7
6
5
5
4
4
2
2
1
1
1
1
1
1
1
1
1
1
1
16
18
17
11
28
20
16
9
11
15
21
18
21
16
3
1
1
2
1
1