WCL2R: A Benchmark Collection for Learning to Rank Research with Clickthrough Data

Otávio D. A. Alcântara, Alvaro R. Pereira Jr., Humberto M. Almeida, Marcos A. Gonçalves, Christian Middleton, Ricardo Baeza-Yates


In this paper we present WCL2R, a benchmark collection  for supporting
research in learning to rank (L2R) algorithms which exploit clickthrough
features.  Differently from other L2R benchmark collections, such as LETOR
and the recently released Yahoo!'s collection for a L2R competition, in
WCL2R we focus on defining a significant (and new) set of features over
clickthrough data extracted from the logs of a real-world search engine.
In this paper, we describe the WCL2R collection by providing details about
how the corpora, queries and relevance judgments were obtained, how the
learning features were constructed  and how the process of splitting the
collection in folds for representative learning was performed. We also analyze the
discriminative power of the WCL2R collection using traditional feature
selection algorithms and show that the most discriminative features are, in fact, those
based on clickthrough data. We then compare several L2R algorithms on
WCL2R, showing that all of them obtain significant gains by exploiting
clickthrough information over using traditional ranking approaches.


Benchmark, Clickthrough, Learning to Rank

Full Text:


An official publication of the Brazilian Computer Society Special Interest Group on Databases.