In this tutorial, we present a portion of unique industry experience in efficient data annotation (labelling) for self-driving cars shared by both leading researchers and engineers from Yandex.
We will present a data processing pipeline required for the cars to learn how to behave autonomously on the roads and we will also show how data annotation constitutes a crucial part that makes the learning process effective. This will be followed by an introduction to data annotation via
public crowdsourcing marketplaces and a presentation of key components of efficient annotation (the technique of task decomposition, quality control methods, aggregation, incremental
relabelling, etc). We will study the most popular and important crowdsourcing tasks needed
for self-driving cars development. Then, in a practice session, participants of our tutorial
will choose one of the real annotation tasks, experiment with selecting settings for the labelling
process, and launch their annotation project on one of the largest crowdsourcing marketplaces.
The projects will be run on real crowds within the tutorial session. Finally, participants will
receive a feedback about their projects and practical advice to make them more efficient.
Reasons for collecting and labeling data via crowdsourcing for SDC. Pros & cons
Key components of crowdsourcing for efficient data labelling
Decomposition approach
Performers selection and training
2D and 3D object segmentation demo
Hands-on practice session: object segmentation pipeline implimented on one of the largest crowdsourcing platforms
Theory on advanced techniques in crowdsourcing: aggregation, incremental relabelling, and pricing
09:00 - 09:30 Part 0: Introduction
09:30 - 10:00 Part I: Crowdsourcing for SDC
— Aggregation models
— Incremental relabelling
— Performance-based pricing
16:15 - 16:30 Part VIII: Discussion of results and conclusions
— Results of your projects
— Ideas for further work and research
— References to literature and other tutorials