A Comparative Study of Learning-to-Rank Techniques for Tag Recommendation

Sérgio D. Canuto, Fabiano M. Belém, Jussara M. Almeida, Marcos A. Gonçalves

Abstract


Tags have become very popular on the Web 2.0 as they facilitate and encourage users to create and share their own content. In this context, there is a large interest in developing strategies to recommend relevant and useful tags for a target object, improving the quality of the generated tags and of the Information Retrieval (IR) services that use them as data source. Several existing tag recommendation strategies treat the problem as a multiple candidate tag ranking problem, recommending tags that are in top positions of the generated ranking. This motivates the use of Learning-to- Rank (L2R) based strategies to automatically “learn” good tag ranking functions. However, previous work has explored only three different L2R techniques, namely, Genetic Programming (GP), RankSV M and RankBoost, comparing at most two of them with respect to effectiveness. In contrast, we here perform a much more comprehensive comparative study of the use of L2R techniques for tag recommendation. Specifically, we compare eight different L2R techniques, namely, Random Forest (RF), MART, λ-MART, ListNet, AdaRank and the three aforementioned techniques, with respect to both effectiveness (i.e., precision, NDCG) and efficiency (i.e., time complexity). We perform experiments using real data collected from five popular Web 2.0 applications, namely, Bibsonomy, LastFM, MovieLens, YahooVideo and YouTube. Our results show that the best L2R based strategy significantly outperforms the best state-of-the-art unsupervised technique (by up to 29% in NDCG). Moreover, unlike existing comparisons of different L2R techniques in other domains, we find that, for tag recommendation, there is a clear winning group of methods (RF, MART and λ-MART) with a slight advantage of two (RF and λ-MART) over the other, with gains in NDCG ranging from 4% to 12% over the best of the remaining alternatives considered. We also find that recommendation time, despite some variation among the different methods, is under 1.3 seconds, on average (in the worst case scenario), for all L2R methods, which confirms the feasibility of the L2R approach for tag recommendation.

Keywords


Tag Recommendation; Relevance Metrics; Learning-to-Rank

Full Text:

PDF


An official publication of the Brazilian Computer Society Special Interest Group on Databases.