Liudmila Prokhorenkova

Publications

Challenges of Generating Structurally Diverse Graphs
Graph machine learning Generative models
Fedor Velikonivtsev
Mikhail Mironov
Liudmila Prokhorenkova
NeurIPS, 2024
For many graph-related problems, it can be essential to have a set of structurally diverse graphs. For instance, such graphs can be used for testing graph algorithms or their neural approximations. However, to the best of our knowledge, the problem of generating structurally diverse graphs has not been explored in the literature. In this paper, we fill this gap. First, we discuss how to define diversity for a set of graphs, why this task is non-trivial, and how one can choose a proper diversity measure. Then, for a given diversity measure, we propose and compare several algorithms optimizing it: we consider approaches based on standard random graph models, local graph optimization, genetic algorithms, and neural generative models. We show that it is possible to significantly improve diversity over basic random graph generators. Additionally, our analysis of generated graphs allows us to better understand the properties of graph distances: depending on which diversity measure is used for optimization, the obtained graphs may possess very different structural properties which gives a better understanding of the graph distance underlying the diversity measure.
Revisiting Graph Homophily Measures
Graph machine learning Machine learning theory
Mikhail Mironov
Liudmila Prokhorenkova
LoG, 2024
Homophily is a graph property describing the tendency of edges to connect similar nodes. There are several measures used for assessing homophily but all are known to have certain drawbacks: in particular, they cannot be reliably used for comparing datasets with varying numbers of classes and class size balance. To show this, previous works on graph homophily suggested several properties desirable for a good homophily measure, also noting that no existing homophily measure has all these properties. Our paper addresses this issue by introducing a new homophily measure — unbiased homophily — that has all the desirable properties and thus can be reliably used across datasets with different label distributions. The proposed measure is suitable for undirected (and possibly weighted) graphs. We show both theoretically and via empirical examples that the existing homophily measures have serious drawbacks while unbiased homophily has a desirable behavior for the considered scenarios. Finally, when it comes to directed graphs, we prove that some desirable properties contradict each other and thus a measure satisfying all of them cannot exist.
Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond
Graph machine learning Machine learning theory
Oleg Platonov
Denis Kuznedelev
Artem Babenko
Liudmila Prokhorenkova
NeurIPS, 2023
Homophily is a graph property describing the tendency of edges to connect similar nodes; the opposite is called heterophily. It is often believed that heterophilous graphs are challenging for standard message-passing graph neural networks (GNNs), and much effort has been put into developing efficient methods for this setting. However, there is no universally agreed-upon measure of homophily in the literature. In this work, we show that commonly used homophily measures have critical drawbacks preventing the comparison of homophily levels across different datasets. For this, we formalize desirable properties for a proper homophily measure and verify which measures satisfy which properties. In particular, we show that a measure that we call adjusted homophily satisfies more desirable properties than other popular homophily measures while being rarely used in graph machine learning literature. Then, we go beyond the homophily-heterophily dichotomy and propose a new characteristic that allows one to further distinguish different sorts of heterophily. The proposed label informativeness (LI) characterizes how much information a neighbor's label provides about a node's label. We prove that this measure satisfies important desirable properties. We also observe empirically that LI better agrees with GNN performance compared to homophily measures, which confirms that it is a useful characteristic of the graph structure.

Posts

March 6, 2023
Research
Introducing new heterophilous graph datasets
Graph machine learning
October 12, 2022
Research
Graph-based nearest neighbor search
Graph machine learning Nearest neighbor search
December 8, 2021
Research
How to validate validation measures
Machine learning theory

Datasets

Heterophilous graph datasets
Graph machine learning
Oleg Platonov
Denis Kuznedelev
Michael Diskin
Artem Babenko
Liudmila Prokhorenkova
A graph dataset is called heterophilous if nodes prefer to connect to other nodes that are not similar to them. For example, in financial transaction networks, fraudsters often perform transactions with non-fraudulent users, and in dating networks, most connections are between people of opposite genders. Learning under heterophily is an important subfield of graph ML. Thus, having diverse and reliable benchmarks is essential.

We propose a benchmark of five diverse heterophilous graphs that come from different domains and exhibit a variety of structural properties. Our benchmark includes a word dependency graph Roman-empire, a product co-purchasing network Amazon-ratings, a synthetic graph emulating the minesweeper game Minesweeper, a crowdsourcing platform worker network Tolokers, and a question-answering website interaction network Questions.
Shifts Dataset
Distributional shift Uncertainty estimation Tabular data Machine translation Natural language processing
Andrey Malinin
Neil Band
Yarin Gal
Mark J. F. Gales
Alexander Ganshin
German Chesnokov
Alexey Noskov
Andrey Ploskonosov
Liudmila Prokhorenkova
Ivan Provilkov
Vatsal Raina
Vyas Raina
Denis Roginskiy
Mariya Shmatova
Panos Tigas
Boris Yangel
The Shifts Dataset contains curated and labeled examples of real, 'in-the-wild' distributional shifts across three large-scale tasks. Specifically, it contains tabular weather prediction, machine translation, and vehicle motion prediction tasks' data used in Shifts Challenge 2021. Dataset shift is ubiquitous in all of these tasks and modalities.