Machine learning theory

We study various aspects related to theoretical understanding of ML models and algorithms.

Area 8. Machine learning theory.svg

Posts

Publications

  • Decentralized Local Stochastic Extra-Gradient for Variational Inequalities

    OptimizationMachine learning theory
    Aleksandr Beznosikov
    Pavel Dvurechensky
    Anastasia Koloskova
    Valentin Samokhin
    Sebastian U. Stich
    Alexander Gasnikov
    NeurIPS,
    2022

    We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach.

  • Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity

    OptimizationMachine learning theory
    Dmitry Kovalev
    Aleksandr Beznosikov
    Ekaterina Borodich
    Alexander Gasnikov
    Gesualdo Scutari
    NeurIPS,
    2022

    We study structured convex optimization problems, with additive objective r:=p+q, where r is (μ-strongly) convex, q is Lq-smooth and convex, and p is Lp-smooth, possibly nonconvex. For such a class of problems, we proposed an inexact accelerated gradient sliding method that can skip the gradient computation for one of these components while still achieving optimal complexity of gradient calls of p and q, that is, O(Lp/μ) and O(Lq/μ), respectively. This result is much sharper than the classic black-box complexity O((Lp+Lq)/μ), especially when the difference between Lp and Lq is large. We then apply the proposed method to solve distributed optimization problems over master-worker architectures, under agents' function similarity, due to statistical data similarity or otherwise. The distributed algorithm achieves for the first time lower complexity bounds on both communication and local gradient calls, with the former having being a long-standing open problem. Finally the method is extended to distributed saddle-problems (under function similarity) by means of solving a class of variational inequalities, achieving lower communication and computation complexity bounds.

  • Optimal Algorithms for Decentralized Stochastic Variational Inequalities

    OptimizationMachine learning theory
    Dmitry Kovalev
    Aleksandr Beznosikov
    Abdurakhmon Sadiev
    Michael Persiianov
    Peter Richtárik
    Alexander Gasnikov
    NeurIPS,
    2022

    Variational inequalities are a formalism that includes games, minimization, saddle point, and equilibrium problems as special cases. Methods for variational inequalities are therefore universal approaches for many applied tasks, including machine learning problems. This work concentrates on the decentralized setting, which is increasingly important but not well understood. In particular, we consider decentralized stochastic (sum-type) variational inequalities over fixed and time-varying networks. We present lower complexity bounds for both communication and local iterations and construct optimal algorithms that match these lower bounds. Our algorithms are the best among the available literature not only in the decentralized stochastic case, but also in the decentralized deterministic and non-distributed stochastic cases. Experimental results confirm the effectiveness of the presented algorithms.