Crowd Science Workshop:

Remoteness, Fairness, and Mechanisms
as Challenges of Data Supply by Humans for Automation

Workshop at
NeurIPS 2020

08:00 -  16:00 PST

Fri, Dec 11th

Despite the obvious advantages, automation driven by machine learning and artificial intelligence carries pitfalls for the lives of millions of people. The pitfalls include disappearance of many well-established mass professions and increasing consumption of labeled data produced by humans. Those data suppliers are often managed by old fashioned approach and have to work full-time on routine pre-assigned task types. Crowdsourcing methodology can be considered as a modern and effective way to overcome these issues since it provides flexibility and freedom for task executors in terms of place, time and the task type they want to work on. However, many potential stakeholders of crowdsourcing processes hesitate to use this technology due to a series of doubts (that have not been removed during the past decade). In order to overcome this, we organize this workshop which will focus research and industry communities on three important aspects: Remoteness, Fairness, and Mechanisms.

Remoteness. Data labeling requesters (data consumers for ML systems) doubt the effectiveness and efficiency of remote work. They need trustworthy quality control techniques and ways to guarantee reliable results on time. Crowdsourcing is one of the viable solutions for effective remote work. However, in spite of the rapid growth and the body of literature on the topic, crowdsourcing is in its infancy and, to a large extent, is still an art. It lacks clear guidelines and accepted practices for both the requester and the performers (also known as workers) side, which significantly impedes the opportunity to realize the full potential of crowdsourcing. We intend to end this trend and achieve a breakthrough in this direction.

Fairness. Crowd workers (data suppliers) doubt the availability and choice of tasks. They need fair and ethical task assignment, fair compensation, and growth opportunities. We believe that a working environment (e.g., a crowdsourcing platform) may help here since it should provide flexibility in choosing/switching tasks and working hours, as well as act fairly and ethically in task assignment. We also aim to address bias in the task design and execution that can skew results in ways that had not been anticipated by data requesters.

Since quality, fairness and growth opportunities for performers are central to our workshop, we will invite a diverse group of performers from a global public crowdsourcing platform to our panel-led discussion.

Mechanisms. Matchmakers (the side of the working environment, usually represented by a crowdsourcing platform) doubt the effectiveness of economic mechanisms that underlie their two-sided market. They need such mechanism design that guarantees proper incentives for both sides to provide flexibility and fairness for workers, while quality and efficiency for data requesters. We stress that the economic mechanisms are the key to successfully address the issues of remoteness and fairness. Hence, we intend to deepen the interaction of communities that work on mechanisms and crowdsourcing.

Invited speakers
Lora Aroyo
Google Research NYC, USA
Gianluca Demartini
University of Queensland, Australia
Praveen Paritosh
Google, USA
Matt Lease
University of Texas at Austin, USA
Seid Muhie Yimam
Universität Hamburg, Germany


All times are in PST (UTC -8)

08:00 – 08:15 Introduction & Icebreakers
08:15 – 08:45 Data Excellence: Better Data for Better AI — Lora Aroyo (invited talk)
08:45 – 09:05 A Gamified Crowdsourcing Framework for Data-Driven Co-creation of Policy Making and Social Foresight — Andrea Tocchetti and Marco Brambilla (contributed talk)
09:05 – 09:25 Conversational Crowdsourcing — Sihang Qiu, Ujwal Gadiraju, Alessandro Bozzon and Geert-Jan Houben (contributed talk)
09:25 – 09:35 Coffee break
09:35 – 10:05 Quality Control in Crowdsourcing — Seid Muhie Yimam (invited talk)
10:05 – 10:25 What Can Crowd Computing Do for the Next Generation of AI Technology? — Ujwal Gadiraju and Jie Yang (contributed talk)
10:25 – 10:45 Real-Time Crowdsourcing of Health Data in a Low-Income country: A case study of Human Data Supply on Malaria first-line treatment policy tracking in Nigeria — Olubayo Adekanmbi, Wuraola Fisayo Oyewusi and Ezekiel Ogundepo (contributed talk)
10:45 – 11:00 Coffee break
11:00 – 12:30 Panel discussion "Successes and failures in crowdsourcing: experiences from work providers, performers and platforms"
12:30 – 13:00 Lunch break
13:00 – 13:30 Modeling and Aggregation of Complex Annotations Via Annotation Distance — Matt Lease (invited talk)
13:30 – 13:50 Active Learning from Crowd in Item Screening — Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon and Zoltán Szlávik (contributed talk)
13:50 – 14:10 Human computation requires and enables a new approach to ethics — Libuse Veprek, Patricia Seymour and Pietro Michelucci (contributed talk)
14:10 – 14:20 Coffee break
14:20 – 14:50 Bias in Human-in-the-loop Artificial Intelligence — Gianluca Demartini (invited talk)
14:50 – 15:10 VAIDA: An Educative Benchmark Creation Paradigm using Visual Analytics for Interactively Discouraging Artifacts — Anjana Arunkumar, Swaroop Mishra, Bhavdeep Sachdeva, Chitta Baral and Chris Bryan (contributed talk)
15:10 – 15:40 Achieving Data Excellence — Praveen Paritosh (invited talk)
15:40 – 16:00 Closing

Panel discussion

At the panel discussion, we gather all stakeholders: researchers, representatives of global crowd platforms --- Toloka and Amazon Mturk, performers, and,  the requesters, who use crowd on a large scale. We hope to stimulate a fruitful discussion, shed light on what is often not discussed, come up with solutions to problems and find new growth points for crowdsourcing.


Perspectives — what is the future of crowdsourcing in terms of science, business, and the profession of the future?

Trust — what are the mechanisms that can strengthen the trust between performers and requesters? How to increase the confidence of the entire IT industry in the use of crowdsourcing in general?

Ethics — what ethical issues exist in the crowdsourcing community, what are the problem areas and what should be done to change them?


Moderator: Olga Megorskaya, Toloka

Marcos Baez, Université Claude Bernard Lyon 1
Pranjal Chutia, Contributor on Toloka
Sara Dolph, Contributor on Amazon Mechanical Turk
Morgan Dutton, Amazon Web Services (AWS)
Olga Masyutina, Contributor on Toloka
Michael Meehan,  Contributor on Amazon Mechanical Turk
Sam Zhong, Microsoft

Program Committee

Marcos Baez, University of Trento
Boualem Benatallah, University of New South Wales
Alessandro Bozzon, Delft University of Technology
Alessandro Checco, University of Sheffield
Anna Lisa Gentile, IBM
Gleb Gusev, Sberbank
Evgeny Krivosheev, University of Trento
Alexey Kushnir, Carnegie Mellon University
Anna Lioznova, Yandex
Lucas Maystre, Spotify
Svetlana Nikitina, University of Trento
Maria Sagaidak, Yandex
Ivan Stelmakh, Carnegie Mellon University
Jie Yang, Delft University of Technology
Fedor Zhdanov, Amazon
Xiong Zhou, Amazon

 Key dates

All deadlines are at 23:59 AOE  

Paper submission deadline: Oct 09, 2020 
Accept/Reject notification: Oct 30, 2020 
Talk recordings: Nov 14, 2020
Final papers due: Nov 30, 2020
NeurIPS conference: Dec 6–12, 2020

Daria Baidakova
Fabio Casati
Alexey Drutsa
Dmitry Ustalov


If you are interested in helping out with the review process or you have other questions, please get in touch with us:
Tue May 18 2021 16:58:53 GMT+0300 (Moscow Standard Time)