Yandex.Market Open Datasets

Yandex.Market is a popular marketplace aggregating offers from online shops and serving millions of user queries daily.
Please note: All materials are intended for non-commercial use.

Learning to filter for sorting by price

We collected the data from the live stream of search queries submitted to Yandex.Market in October-November 2017. We considered only queries for which the user chose ordering by price in ascending order and sampled 30K of them. For each query, we consider only top-500 cheapest candidate documents from thousands of documents chosen by a weak production selection algorithm.

Each line corresponds to one query-document pair and consists of 774 attributes. First 771 of them are numerical features of the query-document pair, the following ones are position (rank of the document in the list of documents for the query ordered by price in ascending order), query ID and relevance score respectively.

The task is to learn a binary classifier which decides on the basis of features whether this document should be selected to the final list for the query. The goal is to maximize some ranking quality measure (based on relevance scores; e.g., DCG) of the final list ordered by price in ascending order.

Download

gz-archive, 1 file, 3.7 Gb