Efficient Data Annotation  
for Self-Driving Cars
via Crowdsourcing on a Large-Scale

Full-day tutorial at
CVPR 2020

Starts at 9:00
(Pacific time)

on Monday 
15th June

2D objects detection
3D objects detection
Moving object tracking

Tutorial overview

In this tutorial, we present a portion of unique industry experience in efficient data annotation (labelling) for self-driving cars shared by both leading researchers and engineers from Yandex.

We will present a data processing pipeline required for the cars to learn how to behave autonomously on the roads and we will also show how data annotation constitutes a crucial part that makes the learning process effective. This will be followed by an introduction to data annotation via public crowdsourcing marketplaces and a presentation of key components of efficient annotation (the technique of task decomposition, quality control methods, aggregation, incremental relabelling, etc). We will study the most popular and important crowdsourcing tasks needed for self-driving cars development. Then, in a practice session, participants of our tutorial will choose one of the real annotation tasks, experiment with selecting settings for the labelling process, and launch their annotation project on one of the largest crowdsourcing marketplaces. The projects will be run on real crowds within the tutorial session. Finally, participants will receive a feedback about their projects and practical advice to make them more efficient.


Speakers
Alexey Drutsa
Crowdsourcing Department, Yandex
Denis Rogachevsky
Self-Driving Cars Department, Yandex
Olga Megorskaya
Crowdsourcing Department, Yandex
Daria Baidakova
Crowdsourcing Department, Yandex
Ivan Semchuk
Self-Driving Cars Department, Yandex
Topics to be covered

Reasons for collecting and labeling data via crowdsourcing for SDC. Pros & cons
Key components of crowdsourcing for efficient data labelling
Decomposition approach
Performers selection and training
2D and 3D object segmentation demo
Hands-on practice session: object segmentation pipeline implimented on one of the largest crowdsourcing platforms
Theory on advanced techniques in crowdsourcing: aggregation, incremental relabelling, and pricing 

Program


09:00 - 09:30 Part 0: Introduction

— The concept of crowdsourcing
— Crowdsourcing task examples
— Crowdsourcing platforms
— Yandex experience on crowdsourcing

09:30 - 10:00 Part I: Crowdsourcing for SDC

— Reasons for crowdsourcing
— The kind of data we collect and label
— Most common tasks and their applications

10:00 - 10:15 Coffee Break

10:15 - 10:50 Part II: Main components of data collection via crowdsourcing

— Decomposition for effective pipeline
— Task instruction & interface: best practices
— Quality control techniques

10:50 - 11:00 Part III: Introduction to Yandex.Toloka for requesters

— Project: creation & configuration
— Pool: creation & configuration
— Tasks: uploading & golden set creation
— Statistics in flight and results downloading

11:00 - 12:00 Lunch Break

12:00 - 12:40 Part IV: Data labeling demos for SDC

— Demos of 2D and 3D object segmentation tasks on crowdsourcing platform
— Performer training and selection for complex tasks
— Q&A
 

12:40 - 13:00 Part V: Brainstorming the pipeline for object segmentation (practice session) 

— Dataset and required labels
— Discussion: how to collect labels?
— Data labelling pipeline for implementation

13:00 - 15:00 Part VI: Setting up and running label collection projects (practice session)

— You
› create
› configure
› run data labelling projects on real performers in real-time

15:00 - 15:15 Coffee Break

15:15 - 16:15 Part VII: Theory on efficient aggregation

— Aggregation models
— Incremental relabelling 
— Performance-based pricing

16:15 - 16:30 Part VIII: Discussion of results and conclusions

— Results of your projects
— Ideas for further work and research
— References to literature and other tutorials 

Slides

Introduction



 

Part 1
"Crowdsourcing for SDC"
 

Part 2
"Main components of data collection via crowdsourcing"

 

Part 3
"Introduction to Yandex.Toloka for requesters"
 

Part 4
"Data labeling demos for SDC"
 

Part 5
"Brainstorming the pipeline for object segmentation"
 

Part 6
"Setting up and running label collection projects"

 

Part 7
"Theory on efficient aggregation"
 

Part 8
"Discussion of results and conclusions"

 

Instruction

Step-by-step
Instruction

 

Tue Jun 16 2020 17:20:19 GMT+0300 (Moscow Standard Time)