We study ratio overall evaluation criteria (user behavior quality metrics) and, in particular, average values of non-user level metrics, that are widely used in A/B testing as an important part of modern Internet companies» evaluation instruments (e.g., abandonment rate, a user»s absence time after a session).
We focus on the problem of sensitivity improvement of these criteria, since there is a large gap between the variety of sensitivity improvement techniques designed for user level metrics and the variety of such techniques for ratio criteria.
We propose a novel transformation of a ratio criterion to the average value of a user level (randomization-unit level, in general) metric that creates an opportunity to directly use a wide range of sensitivity improvement techniques designed for the user level that make A/B tests more efficient. We provide theoretical guarantees on the novel metric»s consistency in terms of preservation of two crucial properties (directionality and significance level) w.r.t. the source ratio criteria.
The experimental evaluation of the approach is done on hundreds large-scale real A/B tests run at one of the most popular global search engines, reinforces the theoretical results, and demonstrates up to $+34%$ of sensitivity rate improvement achieved by the transformation combined with the best known regression adjustment.
ACM International Conference on Web Search and Data Mining
8 Feb 2018