Effective Online Evaluation for Web Search

Full-day tutorial at
SIGIR 2019

on Sunday
21st July

Development of the majority of the leading web services and software products today is generally guided by data-driven decisions based on evaluation that ensures a steady stream of updates, both in terms of quality and quantity. Large internet companies use online evaluation on a day-to-day basis and at a large scale. The number of smaller companies using A/B testing in their development cycle is also growing. Web development across the board strongly depends on quality of experimentation platforms. In this tutorial, we will overview state-of-the-art methods underlying everyday evaluation pipelines at some of the leading internet companies.

This is the third version of the tutorial that have already been presented at WWW and KDD, where it was one of the most popular. We present you a program of a balanced mix between an overview of academic achievements in the field of online evaluation and a portion of unique industrial practical experience shared by both the leading researchers and engineers from Yandex and Facebook. Whether you work at a company, might do so in the future or plan to drive the practice of online evaluation in academia, we welcome you at our tutorial.

We invite software engineers, designers, analysts, service or product managers — beginners, advanced specialists, and researchers — to join us at the conference SIGIR 2019, which will take place in Paris from 21 to 25 of July, to learn how to make web service development data-driven and do it effectively.

Extended abstract and the full list of references are organized in the following overview article. If you wish to refer to the tutorial in your publication, refer to this paper please.


— Problem statement: evaluation of ongoing updates of a web service
— Online vs offline evaluation
— Main approaches for online evaluation: A/B testing, interleaving, observational studies

Part 1: Statistical foundation
— Statistics for online experiments: 101 (statistical hypothesis testing, causal relationship)

Part 2: Development of online metrics
— Main components of an online metric
— Main metric properties (sensitivity and directionality)
— Evaluation criteria beyond difference of averages (periodicity, trends, quantiles, etc.)
— Optimal Distribution Decomposition
— Product-driven ideas for metrics (loyalty and interaction metrics, dwell time based metric patching, session metrics and session division)
— Effective criteria for ratio metrics
— Reducing noise in metric measurements

Part 3: Experimentation pipeline and workflow in the light of industrial practice
— Conducting an A/B experiment: Yandex way (what should be analyzed before starting the experiment, experiments’ review, decision making based on results)
— Cases, pitfalls, lessons learned

Part 4: Interleaving for online ranking evaluation
— Why interleaving: benefits and downsides
— The "classic" interleaving methods and their limitations: Balanced Interleaving, Team Draft, and Probabilistic Interleave
— Learning the scoring function from data
— Optimized Interleaving
— Jointly learning the scoring function and the interleaving policy
— Multi-leaving

Part 5: Machine learning driven A/B testing
— Variance Reduction Based on Subtraction of Prediction
— Learning sensitive metric combinations
— Future Prediction Based metrics
— Smart Scheduling of Online Experiments
— Stopping experiments early: sequential testing


Alexey Drutsa

Research Department, Yandex

Gleb Gusev

Research Department, Yandex

Eugene Kharitonov

Facebook AI Research

Denis Kulemyakin

Experimentation Pipeline, Yandex

Pavel Serdyukov

Research Department, Yandex

Igor Yashkov

Experimentation Pipeline, Yandex


Introduction & Part 1
"Statistical foundation"


Part 2
"Development of online metrics"


Part 3
"Experimentation pipeline and workflow in the light of industrial practice"

Part 4
"Interleaving for online ranking evaluation"


Part 5
"Machine learning driven A/B testing"


Wed Apr 07 2021 16:22:33 GMT+0300 (Moscow Standard Time)