透過您的圖書館登入
IP:3.148.201.19
  • 學位論文

以虛擬相關回饋為基礎之相關詞建議

Relevant Term Suggestion Based on Pseudo Relevance Feedback

指導教授 : 王正豪

摘要


現今網路發達,人們越來越廣泛使用搜尋引擎。但是一般使用者所輸入的查詢詞通常太短、意義過為廣泛,導致搜尋結果過多,使用者往往需花費較多時間瀏覽許多的分頁,因此針對查詢詞加入相關詞建議成為重要的議題。若在回傳結果之外,提供查詢詞前後較常出現的相關詞,能讓使用者更進一步縮小搜尋範圍,增加檢索效率。 本論文利用虛擬相關回饋從web 文件中抽取出相關詞:首先我們擷取各式相關網頁的資料,如關鍵字、新聞、部落格,儲存在Cassandra 的分散式資料庫裡,其次利用TF-IDF 等統計方法,篩選可能的相關詞,並利用Bigrams和Mutual Information的方式,計算出與查詢詞相關程度的排名,回傳給使用者。實驗結果顯示,平均而言使用者能在前二筆找到與查詢相關的字。另外,與搜尋引擎提供的相關詞相比,透過Bigrams 可有效提供較接近之相關詞建議。

並列摘要


Huge amount of information has been posted on the Web. People can easily search lots of information on the Web by using search engines. User queries are usually very short with diverse meanings, and users have to spend more time browsing search result. Thus, term suggestion has become an important research topic. By providing potential relevant terms around the query, we can improve the query representation and the retrieval effectiveness. In this paper, we extract relevant terms from web documents by pseudo relevance feedback. First, we fetch the search results by the search engine API and store them in the Cassandra distributed database. Second, we use statistical methods on information retrieval. The goal is to select more relevant terms with the original query, and calculate the degree of correlation with the query. Experimental results show that relevant terms could be found at top-ranked list. The results also show that the Bigrams plays an important role in providing relevant term suggestions.

參考文獻


[2] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, introduction to Information Retrieval, Cambridge University Press. 2008.
[4] Relevance feedback, http://en.wikipedia.org/wiki/Relevance_feedback (Viewed on 2011/06/10)
[6] Data Mining, http://en.wikipedia.org/wiki/Data_mining (Viewed on 2011/6/10)
[9] Giridhar Kumaran and James Allan. Effective and Efficient User Interaction for Long Queries. In SIGIR’08, pages 11-18, July 20–24, 2008, Singapore.
[12] Jeffrey P. Kern, Marios Pattichis, and Samuel D. Stearns. Registration of image cubes using multivariate mutual information. In Thirty-Seventh Asilomar Conference, pages 1645-1649, 2003.

延伸閱讀