Relevance labels is the essential part of any learning to rank framework. The rapid development of crowdsourcing platforms led to a significant reduction of the cost of manual labeling. This makes it possible to collect very large sets of labeled documents to train a ranking algorithm. However, relevance labels acquired via crowdsourcing are typically coarse and noisy, so certain consensus models are used to measure the quality of labels and to reduce the noise. This noise is likely to affect a ranker trained on such labels, and, since none of the existing consensus models directly optimizes ranking quality, one has to apply some heuristics to utilize the output of a consensus model in a ranking algorithm, e.g., to use majority voting among workers to get consensus labels. The major goal of this paper is to unify existing approaches to consensus modeling and noise reduction within a learning to rank framework. Namely, we present a machine learning algorithm aimed at improving the performance of a ranker trained on a crowdsourced dataset by proper remapping of labels and reweighting of samples. In the experimental part, we use several characteristics of workers/labels extracted via various consensus models in order to learn the remapping and reweighting functions. Our experiments on a large-scale dataset demonstrate that we can significantly improve state-of-the-art machine-learning algorithms by incorporating our framework.
ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016)
30 Jul 2016