Shifts Challenge: Robustness and Uncertain­ty under Real-World Distributional Shift

We invite researchers and machine learning practitioners from all over the world to participate in our NeurIPS 2021 Shifts Challenge on robustness and uncertainty under real-world distributional shift. The aim of the challenge is to raise awareness of distributional shifts in real-world data. The participants’ goal will be to develop models which are robust to distributional shift and to detect such shift via measures of uncertainty in their predictions.

Participants can take part in three separate tracks, for which we provide datasets in three modalities: weather prediction, machine translation and vehicle motion prediction. Participants’ solutions will be presented at the Shifts Challenge workshop at NeurIPS 2021, with awards given to top ranked participants.

Participants are invited to join our SlackSlack community for support and discussion.

  • Yandex Research
  • University of Cambridge

Evaluation Stage

The evaluation stage of our competition will begin on October 17th and will continue until November 7th (deadline extended).

During the evaluation stage participants will evaluate their models on evaluation data, which will be released on October 17th. During the evaluation stage you will be able to submit only once every 24 hours. Remember — prizes are awarded for ranking on evaluation data.

To take part in the evaluation stage, participants must register using the link below. Note, all registrations close after October 17th. Participants can take part in the evaluation stage either as a team or individually (a team of one). Teams can have 1-10 members and are lead by a Team Captain, who is the only one able to make submissions for the team. Note, participants cannot be members of more than one team and cannot participate BOTH individually AND in a team. Prizes are split between team members.


Training, test, and deployment data are usually thought of as independent and identically distributed.

This approach assumes that a machine learning model that performs well on a test set would perform equally well in the wild. Unfortunately, the real world is full of distributional shifts, or mismatches, between training and deployment data.

If deployment data is significantly shifted in relation to training data, then a non-robust model is likely to make a mistake that may lead to unforeseen, possibly disastrous consequences like financial or reputational loss, or loss of life. Furthermore, if this model doesn’t produce estimates of uncertainty, then we can’t prevent such mistakes, as the model can’t tell us about them beforehand. Thus, we want models to both be robust and yield estimates of uncertainty.

Most work on uncertainty and robustness estimation to date has been focused on small-scale image classification tasks. While important, it is far removed from large-scale industrial applications, which are affected by distributional shift. The Shifts Сhallenge bridges the gap between research and applicatiоn by providing the Shifts Dataset, which covers several large-scale tasks acrоss multiple modalities taken from real-world industrial applications.

We believe that the Shifts Challenge will push the machine learning community forward, closer to a safer and robust world, and a better understanding of uncertainty and distributional shift.

The Shifts Challenge is organized around the Shifts Dataset and consists of three non-mutually exclusive tracks, each dedicated to a separate task and its particular data modality:

  • Weather prediction. The goal of this track is to train models, which predict the temperature at a particular latitude/longitude and time, given all available measurements and climate model predictions. These models should both be robust to shifts in time and climate and capable of detecting it.
  • Machine translation. Here, the models are trained to translate a sentence from a source language into a target language. These models should be robust to shifts like atypical and unusual use of language, profanity, emojis and incorrect punctuation in the translation queries.
  • Vehicle motion prediction. The models are trained to predict the distribution over possible positions of vehicles around the self-driving car at a number of moments in the future. Models should be robust to shift in location, season, time of day and precipitation.

The competition is organized in two phases:

  1. Phase I: Training and development data are released. Participants build models and submit to the development leaderboards.
  2. Phase II: Held-out evaluation data is released. Participants have two weeks to tune their models and submit to the evaluation leaderboard. Top solutions on the evaluation leaderboard are awarded prizes.

Top three submissions in each track will receive the following prizes:

  • 1st place$5,000
  • 2nd place$3,000
  • 3rd place$1,000

Competition Timeline2021

  • July 20 — October 17, 2021. Phase I begins.Training and development sets, with references and metadata, are released. Development leaderboard opens
  • October 17 — November 7, 2021. Phase II begins.Evaluation sets are released (but without reference targets or metadata). Evaluation set leaderboard opens
  • November 7, 2021.Deadline for final submission
  • November 8 — November 29, 2021.Organizers verify submissions
  • November 30, 2021. Competition results are announced.Evaluation set references and metadata are released

Community Discussion and Extended Abstracts

While the fields of robustness and uncertainty estimation have matured greatly in recent years, there is still debate on the best approaches for assessing uncertainty and robustness in a meaningful way. We invite participants and members of the community to optionally submit 4-page (without references) extended abstract, in the NeurIPS format, on additional evaluation metrics for the challenge.

Please submit your extended abstracts to by September 30. Abstracts will be discussed by the competition community on SlackSlack and then put forward for a review with the Bayesian Deep Learning (BDL) Workshop at NeurIPS 2021. A final decision on including additional evaluation metrics will be made by the organizers on October 16.

Abstracts accepted to the BDL Workshop will be published as non-archival workshop papers, with authors invited for poster presentation. Select abstracts will be nominated for an oral presentation at the Shifts Challenge workshop and at the BDL Workshop.

Additionally, we will consider extended abstracts on the following topics without deadlines:

  • Applications of uncertainty estimation
  • Datasets analysis
  • Uncertainty estimation in structured tasks
  • Inductive priors and representations for improved robustness
  • Robustness to distributional shift and out-of-distribution generalization
  • Anomaly detection and detection of distributional shift