以非監督式方法利用知識庫與搜尋結果提升網頁搜尋排序一致性

對於網頁搜尋系統如知名搜尋引擎Google, Yahoo!與Bing，相關性排序是一個最重要的問題。相關性排序的傳統方法採用對於查詢分別進行最佳化的方式來增進效能。之前曾有一篇論文提出一個根據查詢意圖的相似性使用兩階段監督式學習，並藉由提升排序一致性來改善相關性排序。然而在該篇論文中有兩個問題需要被提出來解決。第一，他們使用學習排序需要使用大量的查詢紀錄，而如此大量的查詢紀錄只有成熟的搜尋引擎才會擁有，剛開始發展或發展中的搜尋系統必須仰賴非監督式方法來提升相關性排序。第二，該篇論文使用知識庫中的實體來代表查詢意圖。但由於查詢通常含有一些特定的資訊，所以實體並無法完全的表達查詢意圖。舉例來說:``Kobe Bryant family'表達的意圖是想了解Kobe Bryant的家人而非Kobe Bryant本人。在這篇論文當中，我們提出一個藉由搜尋結果與知識庫的兩階段非監督式方法來改善排序一致性與相關性排序，解決不成熟的搜尋系統沒有查詢紀錄的問題。第一階段從搜尋結果擷取排序一致性的分數，並於第二階段藉由衡量獨特性與一致性的方式重新排序搜尋結果。此外，我們在查詢意圖加入查詢模板可以讓我們更清楚的解析查詢意圖。就我們所知，我們的論文是第一個使用非監督式排序一致性方法來改善相關性排序。最後，我們使用Freebase與Yahoo!的搜尋結果當作實驗資料庫並證實我們的方法，結果顯示出我們成功藉由非監督式方法改善了排序一致性與相關性排序的效能。

關鍵字

網頁搜尋；排序一致性；查詢意圖；非監督式方法；知識庫；主題分群；查詢意圖模板

並列摘要

Relevance ranking is the most important problem in web search system, such as Google, Yahoo!, Bing etc. Most of conventional approaches focus on optimizing ranking model by each query separately. One past work propose a two-stage supervised approach to improve relevance ranking by enhancing ranking consistency across queries with similar search intents. However, there are two crucial problems of previous work. First, they use pair-wise learning to rank to learn consistency, and the method relies on large-scale query log which only few of mature web search systems have. Most of developing search engines need to improve their performance without query log. Second, they considers query intents on entities in knowledge base. Nevertheless, entities cannot completely represent query intents because queries contains some specific information to ask, such as ``Kobe Bryant family' for the intents of family. In this work, we propose an two-phase unsupervised approach to improve ranking consistency by knowledge base and search results. The first phase extracts consistency from search results and the second phase re-ranks search results by leveraging consistency and unique. Furthermore, we add query templates to help us clarify query intents completely. For the best of our knowledge, our work is the first unsupervised method with ranking consistency to improve relevance ranking. We conducted extensive experiments using Freebase and search results from Yahoo! search engine, and results demonstrate that our approach improves ranking consistency and relevance ranking significantly.

並列關鍵字

Web Search ； Ranking Consistency ； Query Intent ； Unsupervised Approach ； Knowledge Base ； Topical Cluster ； Query Intent Template

參考文獻

[1] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition, 1997.

[2] S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology, 55(10): 859–868, 2004.

[8] Y. Chen, X. Li, A. Dick, and R. Hill. Ranking consistency for image matching and object retrieval. Pattern Recognition, 47(3):1349–1360, 2014.

[14] J. Hu, G. Wang, F. Lochovsky, J.-t. Sun, and Z. Chen. Understanding user’s query intent with wikipedia. In Proceedings of the 18th international conference on World wide web, pages 471–480, 2009.

[15] J. Jiang, X. Song, N. Yu, and C.-Y. Lin. Focus: learning to crawl web forums. IEEE Transactions on knowledge and Data Engineering, 25(6):1293–1306, 2013.

國際替代計量

以非監督式方法利用知識庫與搜尋結果提升網頁搜尋排序一致性

全文下載

主題瀏覽