GraphPFN: A Prior-Data Fitted Graph Foundation Model
ICML, 2026Graph foundation models face several fundamental challenges including transferability across diverse domains and data scarcity, which calls into question the very feasibility of creating such models. However, despite similar challenges, the tabular domain has recently witnessed the emergence of the first successful foundation models such as TabPFN. These models are based on the prior-data fitted networks (PFN) framework, in which models are pretrained on carefully designed synthetic datasets to make predictions in an in-context learning setting. Recently, G2T-FM, a framework that converts graph node-level tasks into tabular tasks, has made the first step towards adopting PFNs for graphs, yet it is limited to hand-crafted features and was never pretrained on graph data. In this work, we make the next step by proposing GraphPFN, a PFN-based model designed and pretrained specifically for graph node-level tasks. Following the PFN framework, we first design a prior distribution of synthetic attributed graphs by using a novel combination of multi-level stochastic block models and a preferential attachment process for structure generation and graph-aware structured causal models for attribute generation. Then, we augment the tabular foundation model LimiX with attention-based graph neighborhood aggregation layers and train it on millions of synthetic graphs sampled from our prior. On diverse real-world graph datasets with node-level tasks, GraphPFN achieves state-of-the-art results in both in-context learning and finetuning regimes, outperforming G2T-FM, prior GFMs, and task-specific GNNs trained from scratch. More broadly, GraphPFN shows the potential of PFN-based models for building graph foundation models.
TabPack: Efficient Hyperparameter Ensembles for Tabular Deep Learning
ICML, 2026Deep learning models for supervised learning on tabular data are rapidly improving. Notably, ensembles (mixtures of multiple models) often play an important role in achieving top performance, which motivates designing ensemble-first systems rather than treating ensembling as an ad hoc trick. In this work, we present TabPack — a new ensembling approach that packs many base model–optimizer pairs with different hyperparameters into a single neural network and a single optimizer. The base model and optimizer hyperparameters are sampled randomly, after which all base models are trained in parallel, and the final ensemble is built on the fly during training. As a result, TabPack produces powerful ensembles in a single run, with substantial efficiency gains over traditional approaches. With its remarkable efficiency, strong performance on medium-to-large datasets, and reduced reliance on traditional hyperparameter tuning, TabPack is an appealing solution for practitioners and researchers that makes tabular DL more accessible on consumer-grade hardware and suggests a new avenue for designing better tabular deep learning systems.
Unveiling the Role of Data Uncertainty in Tabular Deep Learning
ICML, 2026Recent advancements in tabular deep learning have demonstrated exceptional practical performance, yet the field often lacks a clear understanding of why these techniques actually succeed. To address this gap, our paper highlights the importance of the concept of data (aleatoric) uncertainty for explaining the effectiveness of recent tabular DL methods. While data uncertainty leads to irreducible prediction errors on test samples, it also introduces stochasticity into the training signal that can impede effective learning. We demonstrate that tabular methods differ significantly in their ability to cope with this optimization challenge. Specifically, we reveal that the success of many beneficial design choices in tabular DL, such as numerical feature embeddings, advanced ensembling strategies, retrieval-augmented models, and tabular Prior-Fitted Networks, can be partially attributed to their respective implicit mechanisms for performing well under high data uncertainty. By dissecting these varied mechanisms, we provide a unifying understanding of recent performance improvements. Furthermore, leveraging insights from this perspective, we design a novel, more effective numerical feature embedding method as an immediate practical outcome of our analysis. Overall, our work paves the way toward a principled understanding of the benefits introduced by modern tabular methods that results in the concrete advancements of existing techniques and outlines future research directions for tabular DL.
Artem Babenko
Publications
Posts
- February 13, 2026Research
GraphPFN: a graph foundation model pretrained on diverse synthetic graphs
- April 26, 2021Research
Benchmarks for Billion-Scale Similarity Search
Datasets
TabReD
TabReD is a benchmark for evaluating tabular machine learning models under conditions representative of real-world deployments. It comprises eight datasets from production ML systems at Yandex and Kaggle competitions. TabReD addresses two gaps in existing benchmarks: (1) all datasets use time-based train/validation/test splits to evaluate models under temporal distribution drift, and (2) datasets are feature-rich (median 261 features vs. 13-23 in prior benchmarks) with extensive feature engineering, reflecting real ML pipelines. Experiments on TabReD demonstrate that methods successful on standard benchmarks may underperform on TabReD, making it a critical testbed for assessing whether tabular ML approaches generalize to industrial settings.
Heterophilous graph datasets
A graph dataset is called heterophilous if nodes prefer to connect to other nodes that are not similar to them. For example, in financial transaction networks, fraudsters often perform transactions with non-fraudulent users, and in dating networks, most connections are between people of opposite genders. Learning under heterophily is an important subfield of graph ML. Thus, having diverse and reliable benchmarks is essential.
We propose a benchmark of five diverse heterophilous graphs that come from different domains and exhibit a variety of structural properties. Our benchmark includes a word dependency graph Roman-empire, a product co-purchasing network Amazon-ratings, a synthetic graph emulating the minesweeper game Minesweeper, a crowdsourcing platform worker network Tolokers, and a question-answering website interaction network Questions.
Text-to-Image dataset for billion-scale similarity search
Yandex Text-to-Image (T2I) dataset is collected to foster the research in billion-scale nearest neighbor search (NNS) when query distribution differs from the indexing one. In particular, this dataset addresses the cross-domain setting: a user specifies a textual query and requests the search engine to retrieve the most relevant images to the query. Notably, current large-scale indexing methods perform poorly in this setting. Therefore, novel highly-performant indexing solutions robust to out-of-domain queries are in high demand.
The dataset represents a snapshot of the Yandex visual search engine and contains 1 billion 200-dimensional image embeddings for indexing. The image embeddings are produced by the Se-ResNext-101 model. The embeddings for textual queries are extracted by a variant of the DSSM model.
Read more about the data format and how to download the dataset in the related post.