透過您的圖書館登入
IP:216.73.216.209
  • 期刊

個人化的網頁摘要文件分群系統

A Personal Search System with the Clustering Ability

摘要


本論文發展了一套具有分群能力之個人化系統,Personalization Web-Snippet Clustering System(PWSC),此系統是基於元搜尋技術。此系統的第一階段根據使用者所輸入之查詢,針對不同搜尋引擎匯集相關網頁摘要文件。第二階段,透過Mean Reciprocal Rank(MRR)計算模型重新排列網頁摘要文件。第三階段,將收集到的網頁摘要文件,經由N字詞語言模型產生分群標籤。第四階段,依據分群標籤建構出階層式分群。最後階段為建立個人化系統,其能依據使用者所選擇的標籤及運算,產生不同的搜尋結果,這樣將能幫助使用者快速尋找想要的資訊。根據實驗結果,本系統的性能優於商業和學術系統。

並列摘要


In this paper, we develop a personal search system with the clustering ability, called Personalization Web-Snippet Clustering System (PWSC) that is based on a Metasearch technique. The first stage of the system is to collect the relevant snippets from different search engines based on the user's query. The second stage is to rearrange the weight of the collected snippets based on a Mean Reciprocal Rank (MRR) measure. The third stage is to use word N-gram for language model to generate the clustering labels from our collected snippets. The fourth stage is to build a hierarchical tree based on all clustering labels. The final stage is to build a personal search system by the user to select some of the most interesting labels and operations to help the user quickly locate information of interest. According to all experiment results, the performance of our system is superior to the commercial and academic systems.

參考文獻


Alpert, J. and Hajaj, N. (2008), ‘Official Google blog: we knew the Web was big', available at http://0rz.tw/9TuEV (accessed 11 September 2012)
Baeza-Yates, R.,Ribeiro-Neto, B.(1999).Modern Information Retrieval.Boston, Massachusetts:Addison Wesley Press.
Benson, M.(1989).The structure of the collocational dictionary.International Journal of Lexicography.2(1),1-14.
Brown, P. F.,deSouza, P. V.,Mercer, R. L.,Pietra, V. J. D.,Lai, J. C.(1992).Class-based N-gram models of natural language.Computational Linguistics.18(4),467-479.
Carpineto, C.,Mizzaro, S.,Romano, G.,Snidero, M.(2009).Mobile information retrieval with search results clustering: prototypes and evaluations.Journal of the American Society for Information Science and Technology.60(5),877-895.

延伸閱讀