Yandex at NeurIPS 2021: papers, challenges and demo

December 18, 2021

Yandex Research had a strong presence at NeurIPS 2021, one of the most influential machine learning conferences, with eight papers, two benchmarks, two challenges and a demo.

Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift

Yandex Research, alongside scientists from the Universities of Oxford and Cambridge, organized the Shifts Challenge at the NeurIPS 2021 on robustness and uncertainty under real-world distributional shift. The goal was to raise awareness of distributional shift and accelerate the development of robust models capable of providing accurate estimates when navigating uncertain situations.

The main obstacle to the development of robust models which yield accurate uncertainty estimates is the availability of large, diverse datasets which contain examples of the distributional shift from current industrial tasks. We addressed this issue by releasing a large dataset with examples of real distributional shifts in weather prediction, machine translation and vehicle motion prediction as part of the Shifts Challenge. The Shifts motion prediction dataset, courtesy of the Yandex Self-driving Group, is the largest in the industry to date, containing more than 1,600 hours of driving across six cities in the US, Israel and Russia.

At NeurIPS 2021, we ran a breakout workshop with keynotes and a panel discussion from leading experts — Milica Gasic, Rowan McAllister, Kate Saenko, Ramon Astudillo, Thomas Dietterich, Sergey Levine — in the field on research frontiers in uncertainty estimation and robustness to distributional shift. Check out the videos.

Billion-Scale Approximate Nearest Neighbor Search Challenge

At NeurIPS 2021, Yandex Research co-organized Billion-Scale Approximate Nearest Neighbor Search Challenge. This competition aimed to set up a leaderboard of ANNS algorithms on representative billion-scale datasets and encourage novel approaches that significantly increase search accuracy and efficiency. 

As part of this challenge, we released two billion-scale datasets that can serve as representative benchmarks for researchers from the machine learning and algorithmic communities interested in efficient similarity search.

Training Transformers Together

In this demonstration, organized jointly by Yandex Research, Hugging Face, HSE University and the University of Washington, we invited the attendees of NeurIPS 2021 to train an open-source version of DALL-E (a text-to-image generation model) together over the Internet. People could join the experiment from their local computers and free cloud instances.

In addition, participants of the demo visited our webpage and learned about methods for efficient distributed training over slow networks with peers from around the world. Our NeurIPS 2021 publication on collaborative deep learning described the technologies behind these methods. 

We received a lot of positive feedback and insightful questions during the live session. In total, more than 20 people joined the experiment and contributed more than 147 compute days in the first ten days of training. The model has converged to a decent quality of generated samples, which is impressive given the relatively small amount of resources spent on its training. We are also planning to train a larger model designed with the broader enthusiast community, so stay tuned for more!

Presented papers

In a previous post, we described our papers accepted to NeurIPS 2021, including research on revisiting deep learning models for tabular data, validation of classification measures and more.