Online Evaluation for Effective Web Service Development

Tutorial at
KDD 2018

on Sunday
19th August

Development of the majority of the leading web services and software products today is generally guided by data-driven decisions based on evaluation that ensures a steady stream of updates, both in terms of quality and quantity. Large internet companies use online evaluation on a day-to-day basis and at a large scale. The number of smaller companies using A/B testing in their development cycle is also growing. Web development across the board strongly depends on quality of experimentation platforms. In this tutorial, we will overview state-of-the-art methods underlying everyday evaluation pipelines at some of the leading internet companies.

We invite software engineers, designers, analysts, service or product managers — beginners, advanced specialists, and researchers — to join us at the conference KDD 2018, which will take place in London from 19 to 23 of August, to learn how to make web service development data-driven and do it effectively.

Extended abstract and the full list of references are organized in the following overview article. If you wish to refer to the tutorial in your publication, refer to this paper please.


Introduction — Problem statement: evaluation of ongoing updates of a web service — Online vs offline evaluation — Main approaches for online evaluation: A/B testing, interleaving, observational studies

Part 1: Statistical foundation — Statistics for online experiments: 101 (statistical hypothesis testing, causal relationship)

Part 2: Development of online metrics — Main components of an online metric — Main metric properties (sensitivity and directionality) — Evaluation criteria beyond difference of averages (periodicity, trends, quantiles, etc.) — Product-driven ideas for metrics (loyalty and interaction metrics, dwell time based metric patching, session metrics and session division) — Effective criteria for ratio metrics — Reducing noise in metric measurements

Part 3: Experimentation pipeline and workflow in the light of industrial practice — Conducting an A/B experiment: Yandex way (what should be analyzed before starting the experiment, experiments’ review, decision making based on results) — Cases, pitfalls, lessons learned

Part 4: Machine learning driven A/B testing — Randomized experiment vs Observational study — Variance Reduction Based on Subtraction of Prediction — Heterogeneous Treatment Effect — Learning sensitive metric combinations — Future Prediction Based metrics — Smart Scheduling of Online Experiments — Stopping experiments early: sequential testing


Roman Budylin

Experimentation Pipeline, Yandex

Alexey Drutsa

Research Department, Yandex

Gleb Gusev

Research Department, Yandex

Pavel Serdyukov

Research Department, Yandex

Igor Yashkov

Experimentation Pipeline, Yandex


Introduction & Part 1
"Statistical foundation"


Part 2
"Development of online metrics"


Part 3
"Experimentation pipeline and workflow in the light of industrial practice"

Part 4
"Machine learning driven A/B testing"


Wed Apr 07 2021 16:22:24 GMT+0300 (Moscow Standard Time)