Machine learning theory

We study various aspects related to theoretical understanding of ML models and algorithms.

Area 8. Machine learning theory.svg



  • Graph-based Nearest Neighbor Search in Hyperbolic Spaces

    Nearest neighbor searchRepresentationsMachine learning theoryGraph machine learning
    Liudmila Prokhorenkova
    Dmitry Baranchuk
    Nikolay Bogachev
    Yury Demidovich
    Alexander Kolpakov

    The nearest neighbor search (NNS) problem is widely studied in Euclidean space, and graph-based algorithms are known to outperform other approaches for this task. However, hyperbolic geometry often allows for better data representation in various domains, including graphs, words, and images. In this paper, we show that graph-based approaches are also well suited for hyperbolic geometry. From a theoretical perspective, we rigorously analyze the time and space complexity of graph-based NNS, assuming that an n-element dataset is uniformly distributed within a d-dimensional ball of radius R in the hyperbolic space of curvature -1. Under some conditions on R and d, we derive the time and space complexity of graph-based NNS and compare the obtained results with known guarantees for the Euclidean case. Interestingly, in the dense setting (d << log(n)) and under some assumptions on the radius R, graph-based NNS has lower time complexity in the hyperbolic space. This agrees with our experiments: we consider datasets embedded in hyperbolic and Euclidean spaces and show that graph-based NNS can be more efficient in the hyperbolic space. We also demonstrate that graph-based methods outperform other existing baselines for hyperbolic NNS. Overall, our theoretical and empirical analysis suggests that graph-based NNS can be considered a default approach for similarity search in hyperbolic spaces.

  • On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay

    OptimizationMachine learning theory
    Ekaterina Lobacheva
    Maxim Kodryan
    Nadezhda Chirkova
    Andrey Malinin
    Dmitry Vetrov

    Training neural networks with batch normalization and weight decay has become a common practice in recent years. In this work, we show that their combined use may result in a surprising periodic behavior of optimization dynamics: the training process regularly exhibits destabilizations that, however, do not lead to complete divergence but cause a new period of training. We rigorously investigate the mechanism underlying the discovered periodic behavior from both empirical and theoretical points of view and analyze the conditions in which it occurs in practice. We also demonstrate that periodic behavior can be regarded as a generalization of two previously opposing perspectives on training with batch normalization and weight decay, namely the equilibrium presumption and the instability presumption.

  • Distributed Saddle-Point Problems Under Similarity

    OptimizationMachine learning theory
    Aleksandr Beznosikov
    Gesualdo Scutari
    Alexander Rogozin
    Alexander Gasnikov

    We study solution methods for saddle point problems over networks of two type: master/workers (centralized) architectures and meshed (decentralized) networks. The local functions at each node are assumed to be similar, due to statistical data similarity or otherwise. We establish lower complexity bounds for a fairly general class of algorithms solving saddle point problems. We show that in such a setup, the number of communications can be significantly reduced and in some cases does not depend on the properties of the functions at all. We then propose optimal algorithms matching the lower bounds over either types of networks.