Paper accepted to ICML 2022
A paper has been accepted for publication at the International Conference on Machine Learning (ICML 2022).
Secure Distributed Training at Scale by Eduard Gorbunov, Aleksandr Borzunov, Michael Diskin, Max Ryabinin
Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and vision. Training such models requires a lot of computational resources (e.g., HPC clusters) that are not available to small research groups and independent researchers. One way to address it is for several smaller groups to pool their computational resources together and train a model that benefits all participants. However, such a training run may be jeopardized by dishonest participants sending incorrect updates. In this paper, we propose a novel protocol for decentralized training, both tolerant of a certain share of dishonest participants and suitable for large-scale deep learning with models having billions of parameters.