從搜尋記錄自動建立多層級語料應用於搜尋排序學習

本論文主旨為提出自動生成網頁相關度語料的方法，並驗證此語料可以應用於搜尋排序學習。人工標記網頁相關度語料需要相當大的花費，且與真實搜尋所需不完全相符，因此本論文擬藉由搜尋記錄中大量的真實搜尋行為，自動建立網頁相關度語料。首先介紹兩個使用到的外部資源：微軟研究院的搜尋記錄，與微軟亞洲研究院所建立的人工標記語料LETOR。接著介紹從搜尋記錄中挑選樣本的方法及理由，以及不同估計網頁相關度的方法。其次是如何依循搜尋記錄取得網頁內容，以及針對挑選出的關鍵詞集合與網頁集合進行特徵值抽取。最後介紹排序學習，以及使用的演算法之特徵與嘗試的參數範圍。第一個實驗的目的在於驗證自動生成語料的品質、以及找出最佳的設定方式。我們發現所提出的方法確實能自動產生具有一定品質的語料，且能應用於多種演算法上，同時也發現點擊機率是不錯的相關度估計參考標準。第二個實驗的目的在於驗證自動生成的語料，能用於評估排序學習演算法間的效能差異。我們發現使用自動生成語料作為訓練，能得到與使用人工標記語料作為訓練相同的效能差異，因此具有潛能取代人工標記語料，以節省建立語料的花費。

關鍵字

搜尋記錄；排序學習；點擊行為；相關度估計；效能評估

並列摘要

無資料

並列關鍵字

Query Log ； Learning to Rank ； User Click Behavior ； Relevance Estimation ； Performance Evaluation

參考文獻

Olivier Chapelle, and Ya Zhang (2009). “A Dynamic Bayesian Network Click Model for Web Search Ranking.” Proceedings of the 18th International Conference on World Wide Web, 2009, pp. 1-10.

Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey (2008). “An Experimental Comparison of Click Position-Bias Models.” Proceedings of the International Conference on Web Search and Data Mining, 2008, pp. 87-94.

Craig Macdonald and Iadh Ounis (2009). “Usefulness of Quality Click-through Data for Training.” Proceedings of the 2009 Workshop on Web Search Click Data, 2009, pp. 75-79.

Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Yong Yu, Wei-Ying Ma, WenSi Xi, and Wei-Guo Fan (2004). “Optimizing Web Search Using Web Click-through Data.” Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, 2004, pp. 118-126.

Ricardo Baeza-Yates and Alessandro Tiberi (2007). “Extracting Semantic Relations from Query Logs. ” Proceedings of the 13rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, pp. 76-85.

Google Scholar

國際替代計量

從搜尋記錄自動建立多層級語料應用於搜尋排序學習

全文下載

主題瀏覽