透過您的圖書館登入
IP:18.220.160.216
  • 學位論文

將機率模型以及圖形隨機漫步理論應用在時序資料以改良網頁搜尋品質

Combining probabilistic model with graph-based random walk to improve search quality through exploiting time-sensitive query information

指導教授 : 林守德
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


現今的搜尋引擎設備提供使用者輕易的搜尋,藉由輸入關鍵字,搜尋引擎會回相關的事物。但是關鍵字的意圖會隨著時間不同而相異,因此時間相關的資訊,可以提供給搜尋引擎對時間敏感關鍵字回傳的結果作優化。 在這篇論文當中,我們針對時間相關的資訊提出新的重新排序排名的方法,來增進搜尋結果的品質。我們主要是往兩個不同的資料面向去做優化: 1. 關鍵字具有時間相關的資訊的資料。 2. 關鍵字不具時間相關資訊的資料。 主要的方法是,我們將支援向量回歸加入時間相關的特徵去對搜尋結果最排序的優化。 在我的實驗結果中可以看到,在關鍵字具有時間相關資訊的資料中,使用時間相關的資訊,比起原本的排名,會得到10.28%左右的進步。而在關鍵字不具有時間相關資訊的資料中,會得到1.14%的進步。 在這篇論文的最後,我們針對我們由時間相關資訊所產生的特徵值做了分析,並比較之間的優缺點。

並列摘要


Search Engine services provide platforms for users to search their intent using query. The intent of query may vary in different time period. Time related information should be taking into consideration when search engine return search results. In this paper, we present new re-ranking methods based on time information to improve search result quality. This paper aims at re-ranking search result depending on time sensitive information to improve the following situation: 1. Existed Queries dataset: URLs clicked by queries have sufficient time click information in training data. 2. Rare Queries dataset: URLs clicked by queries have on clicks information in training data and bad search results dataset. We propose SVM Regression using time related features to effectively re-rank the search result of each query depending on click number in each time periods. And propose useful features generated from three methodologies on Existed Query dataset: (a) Probabilistic Prior, (b) Probabilistic Model using Language Model and KL-divergence, and (c) Page Rank approach based on Time click. Besides, without click information on rare query dataset, we also propose features on rare queries dataset (a) Extract clicks from related query (b) Time based Page Rank. Then combine some features for SVM Regression to predict. In my experiment results show that the proposed approach gains 10.28% improve over the original ranking in the AOL query log on Existed Query dataset. In rare query dataset, SVM Regression gains 1.14% improvement on Existed queries and 12.9% improvement on Non-Existed queries. In the end, we analysis the improvement of each methods and discuss the pros and cons between these methods.

參考文獻


[1] Elsas, J. and Dumais, S. T. Leveraging temporal dynamics of document content in relevance ranking. In Proc. of ACM WSDM Conference, 2010.
[3] Kulkarni, A., Teevan, J., Svore, K., and Dumais, S. Understanding temporal query dynamics. In Proc. of WSDM, 2011.
[6] Zhang, R., Chang, Y., Zheng, Z., Metzler, D. and Nie, J.-Y. Search result re-ranking by feedback control adjustment for time-sensitive query. In Proc. of NAACL, 2009.
[7] Alfonseca, E., Ciaramita, M. and Hall, K. Gazpacho and summer rash: Lexical relationships from temporal patterns of Web search queries. In Proceedings of EMNLP 2009, 1046-1055.
[9] Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D. and Frieder. Hourly analysis of a very large topically categorized Web query log. In Proceedings of SIGIR 2004, 321-328

延伸閱讀