Shifts Challenge: Robustness and Uncertain­ty under Real-World Distributional Shift

The Shifts Challenge is now an independent international collaboration. Find out more about the 2022 challenge at

We invite researchers and machine learning practitioners from all over the world to participate in our NeurIPS 2021 Shifts Challenge on robustness and uncertainty under real-world distributional shift. The aim of the challenge is to raise awareness of distributional shifts in real-world data. The participants’ goal will be to develop models which are robust to distributional shift and to detect such shift via measures of uncertainty in their predictions.

Participants can take part in three separate tracks, for which we provide datasets in three modalities: weather prediction, machine translation and vehicle motion prediction. Participants’ solutions will be presented at the Shifts Challenge workshop at NeurIPS 2021, with awards given to top ranked participants.

Participants are invited to join our SlackSlack community for support and discussion.

  • Yandex Research
  • University of Cambridge

Shifts Challenge

The Shifts Challenge has come to a close and we are delighted to present to you the winners and their solutions.

Weather Prediction
  • 1st place
    BondIvan Bondarenko
  • 2nd place
    CabbeanWeatherStepan Andreev and Andrey Elnikov
  • 3rd place
    KDDI ReseachRyoichi Kojima, Guillaume Habault, Roberto Legaspi and Shinya Wada
Machine Translation
The Machine Translation track does not have winners as no solution managed to beat the Shifts Challenge team’s benchmark.
Vehicle Motion Prediction
  • 1st place
    SBTeamAlexei Postnikov
  • 2nd place
    Alexey and DmitryAlexey Pustynnikov and Dmitry Eremeev
  • 3rd place
    NTU_CMLab_MiraChing-Yu Tseng, Po-Shao Lin, Yu-Jia Liou, Kuan-Chih Huang and Winston H. Hsu
Unranked Notable Solution:
HomeThomas Gilles and Stefano Sabatin, Dmitry Tsishko, Bogdan Stanciulescu and Fabien Moutarde

Hope to see you soon.


Training, test, and deployment data are usually thought of as independent and identically distributed.

This approach assumes that a machine learning model that performs well on a test set would perform equally well in the wild. Unfortunately, the real world is full of distributional shifts, or mismatches, between training and deployment data.

If deployment data is significantly shifted in relation to training data, then a non-robust model is likely to make a mistake that may lead to unforeseen, possibly disastrous consequences like financial or reputational loss, or loss of life. Furthermore, if this model doesn’t produce estimates of uncertainty, then we can’t prevent such mistakes, as the model can’t tell us about them beforehand. Thus, we want models to both be robust and yield estimates of uncertainty.

Most work on uncertainty and robustness estimation to date has been focused on small-scale image classification tasks. While important, it is far removed from large-scale industrial applications, which are affected by distributional shift. The Shifts Сhallenge bridges the gap between research and applicatiоn by providing the Shifts Dataset, which covers several large-scale tasks acrоss multiple modalities taken from real-world industrial applications.

We believe that the Shifts Challenge will push the machine learning community forward, closer to a safer and robust world, and a better understanding of uncertainty and distributional shift.

The Shifts Challenge is organized around the Shifts Dataset and consists of three non-mutually exclusive tracks, each dedicated to a separate task and its particular data modality:

  • Weather prediction. The goal of this track is to train models, which predict the temperature at a particular latitude/longitude and time, given all available measurements and climate model predictions. These models should both be robust to shifts in time and climate and capable of detecting it.
  • Machine translation. Here, the models are trained to translate a sentence from a source language into a target language. These models should be robust to shifts like atypical and unusual use of language, profanity, emojis and incorrect punctuation in the translation queries.
  • Vehicle motion prediction. The models are trained to predict the distribution over possible positions of vehicles around the self-driving car at a number of moments in the future. Models should be robust to shift in location, season, time of day and precipitation.

The competition is organized in two phases:

  1. Phase I: Training and development data are released. Participants build models and submit to the development leaderboards.
  2. Phase II: Held-out evaluation data is released. Participants have two weeks to tune their models and submit to the evaluation leaderboard. Top solutions on the evaluation leaderboard are awarded prizes.

Top three submissions in each track will receive the following prizes:

  • 1st place$5,000
  • 2nd place$3,000
  • 3rd place$1,000

Competition Timeline2021

  • July 20 — October 17, 2021. Phase I begins.Training and development sets, with references and metadata, are released. Development leaderboard opens
  • October 17 — November 7, 2021. Phase II begins.Evaluation sets are released (but without reference targets or metadata). Evaluation set leaderboard opens
  • November 7, 2021.Deadline for final submission
  • November 8 — November 29, 2021.Organizers verify submissions
  • November 30, 2021. Competition results are announced.Evaluation set references and metadata are released

Community Discussion and Extended Abstracts

While the fields of robustness and uncertainty estimation have matured greatly in recent years, there is still debate on the best approaches for assessing uncertainty and robustness in a meaningful way. We invite participants and members of the community to optionally submit 4-page (without references) extended abstract, in the NeurIPS format, on additional evaluation metrics for the challenge.

Please submit your extended abstracts to by September 30. Abstracts will be discussed by the competition community on SlackSlack and then put forward for a review with the Bayesian Deep Learning (BDL) Workshop at NeurIPS 2021. A final decision on including additional evaluation metrics will be made by the organizers on October 16.

Abstracts accepted to the BDL Workshop will be published as non-archival workshop papers, with authors invited for poster presentation. Select abstracts will be nominated for an oral presentation at the Shifts Challenge workshop and at the BDL Workshop.

Additionally, we will consider extended abstracts on the following topics without deadlines:

  • Applications of uncertainty estimation
  • Datasets analysis
  • Uncertainty estimation in structured tasks
  • Inductive priors and representations for improved robustness
  • Robustness to distributional shift and out-of-distribution generalization
  • Anomaly detection and detection of distributional shift