  • 學位論文


Aggregating Multi-Resources to Improve the Diversity of Search Engine Result Pages

指導教授 : 鄭卜壬


在先前對搜尋引擎結果頁面產生片段資訊(snippet)的方法著重於針對單一搜尋結果之優化,主要考量搜尋詞彙相關性及上下文的資訊含量。在此篇論文中,我們欲在單一搜尋結果頁面中的多個搜尋結果分別產生多個片段資訊,並且將此多個片段資訊視為該搜尋詞的總覽。 首先我們自問答社群網頁系統、線上百科全書、搜尋引擎推薦詞中分別抽取不同類別搜尋詞之屬性詞與前後文,並藉此資訊以產生片段資訊。在產生片段資訊時,將考量句子是否與搜尋詞相關、句子是否與該類別相關、以及句子是否含有先前抽取出來之屬性詞。在系統的第二階段,我們利用整數線性規畫找出一組最佳的句子組合,作為我們的系統輸出-多個片段資訊。除此之外,我們將結合該搜尋詞的擴充推薦搜尋之結果頁面,以補強原先未找出之屬性詞以增加每個搜尋詞之多元性。 實驗資料來源為Wikipedia、Yahoo! Answers及Google Search Autocomplete,在結果中可看出我們提出產生片段資訊之方法可行並且優於其他的摘要方法,最終有效地增加搜尋引擎結果頁面之多樣性。


Previous work on snippet generation focused mainly on how to produce one snippet for an individual search result. This paper aims to generate snippets as a comprehensive overview for an entity query (e.g., flu) in a search-result page. Our approach first extracts the attributes (e.g., symptom and diagnose) of the categories (e.g., disease) from multi-resources including a community-based question-answering (CQA) website, an online encyclopedia website and suggestions from a commercial search engine. Then, we generate the snippets based on how central a sentence is to the query, its category, and how well it diversifies the attributes from multi-resources. Integer Linear Programming (ILP) is adopted to find the optimal sentence set. After finding the initial set of sentences, we further improve the result by aggregate the search-result page(SERP) of the query's suggestion words. The experiments are conducted on Wikipedia, Yahoo! Answers, Google Search. Experimental results demonstrate the effectiveness of our approach, compared to an existing commercial search engine and several summarization baselines.


[1] Mikhail Bautin and Steven Skiena. Concordance-based entity-oriented search. Web Intelli. and Agent Sys., 7(4), December 2009.
[2] Zheng Xu, Xiangfeng Luo, Jie Yu, and Weimin Xu. Mining web search engines for query suggestion. Concurr. Comput. : Pract. Exper., 23(10), July 2011.
[5] Ramakrishna Varadarajan. A system for query-specific document summarization. In Proc. of CIKM, 2006.
[6] Anastasios Tombros and Mark Sanderson. Advantages of query biased summaries in information retrieval. In Proc. of SIGIR, 1998.
[8] Youngjoong Ko, Hongkuk An, and Jungyun Seo. Pseudo-relevance feedback and statistical query expansion for web snippet generation. Inf. Process. Lett., 109(1), 2008.
