迴歸於序數評等之研究

排序問題的相關研究在近年來漸趨熱門，在網路搜尋引擎、推薦系統等許多方面都有著廣泛的應用。而隨著網路的資訊量爆炸，如何藉由這些大量資料來快速地訓練出一個好的排序系統，便成為日益重要的問題。本篇論文著重於處理龐大的序數資料集，不同於傳統分類問題中的類別，在序數資料集內的類別之間帶有等級差距的資訊。本論文中研究如何於序數資料集上，利用迴歸分析去處理兩種類型的排序相關問題。其中一種排序問題是對資料做評等，排序系統會判斷此資料是屬於哪種序數等級；另外一種排序問題則是頂端排序問題，旨在研究如何讓與搜尋關鍵字相關的資料文件，能依其相關性高低排序產生一個搜尋列表。我們提出一個架構可以和各式迴歸相關演算法結合來處理排序問題，在此架構中導入了成本觀念去處理等級差距的資訊，而在兩個問題上的實驗結果都顯示了此架構可以有效提升排序結果。我們也針對在頂端排序問題上的一個熱門評量標準，進而提出了一個有效的成本函數，實驗結果則顯示此成本函數在此評量標準上更可有效提升排序效果。

關鍵字

序數評等；排序問題；迴歸分析；成本資訊

並列摘要

Ranking is a popular research problem in recent years and has been used in wide range of applications including web-search engines and recommendation systems. In this thesis, we study on two ranking problems, the ordinal ranking problem and the top-rank problem. We propose a novel ranking approach, cost-sensitive ordinal classification via regression (COCR), which respects the discrete nature of the ordinal ranks in real-world data sets. In particular, COCR applies a theoretically-sound reduction from ordinal to binary classification and solves the binary classification sub-tasks with point-wise regression. Furthermore, COCR allows specifying mis-ranking costs to further boost the ranking performance. We conduct experiments on two ranking problems respectively. On the ordinal ranking problem, we compare different approaches based on decision trees. The results show that the proposed COCR can perform better on many data sets when coupled with the appropriate cost. Furthermore, on the top-rank problem, we derive the corresponding cost of a popular ranking criterion, Expected Reciprocal Rank (ERR), and plug the cost into the COCR approach. The resulting ERR-tuned COCR enjoys the benefits of both efficiency by using point-wise regression and top-rank prediction accuracy from the ERR criterion. Evaluations on two large-scale data sets, including ``Yahoo! Learning to Rank Challenge' and ``Microsoft Learning to Rank', verify that some basic COCR settings outperform commonly-used regression approaches significantly. In addition, even better performance can often be achieved by the ERR-tuned COCR.

並列關鍵字

ordinal ranking ； list-wise ranking ； cost-sensitive ； regression

參考文獻

Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.

Christopher J.C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole

Hamilton, and Greg Hullender. Learning to rank using gradient descent. In Proceedings

mooth cost functions. In Advances in Neural Information Processing Systems (NIPS),

volume 18, pages 193–200. MIT Press, 2006.

國際替代計量

迴歸於序數評等之研究

全文下載

主題瀏覽