透過您的圖書館登入
IP:3.147.79.45
  • 學位論文

基於維基百科之語義關鍵字擴展系統及其應用

A Wikipedia-Based Conceptual Keyword Expansion System and Its Application

指導教授 : 吳家麟

摘要


在本論文中,我們提出一個使用隱藏在維基百科 (Wikipedia) 中的協同知識及人際網路來產生語義關係圖並使用產生的語義關係圖來實施關鍵字擴展的架構。維基百科是一個網頁式的、人人可編輯的免費百科全書。此外,維基百科保存了所有的修訂版本及每個網頁的所有貢獻者,因此,我們可以收集編輯過特定概念網頁的貢獻者的資訊。並且,我們可以構建一個由維基百科的主題及維基百科的貢獻者組成的二部圖 (bipartite graph)。接著我們利用了一個新穎的權重模型來把二部圖折成只有主題的圖 (我們稱之為語義關係圖)。語義關係圖是一種形似WordNet的網路 (既然我們視維基百科裡各式各樣的條目為概念,構成的字都具有特定的語義)。再者,連接字的邊上的權重表示字之間語義相關程度。基於語義字擴展系統,我們也提出了一個機制來評價語義關係圖上的節點,並提供了一個排行榜給那些習慣於傳統推薦系統的使用者。同時我們也展示了一個嶄新的圖片搜尋方法。實驗結果顯示我們的系統兼具擴充性與實用性。據我們所知,我們是率先使用維基百科上的人際網路來計算語義關係的團隊。另外,還首創了用語義關係圖來做圖片搜尋。

並列摘要


In this thesis, we propose a framework to generate semantic graphs by using collabora-tive knowledge as well as social network hidden behind Wikipedia, and the derived se-mantic graphs are then used to conduct semantic keyword expansion. Wikipedia is a web-based free encyclopedia that anyone can edit. In addition, Wikipedia keeps all ver-sions and contributors for each page, and therefore, we can collect the information of all contributors who have edited some specific concept page. As a result, we can form a bipartite graph between topics in Wikipedia and Wikipedia’s contributors. We then util-ize a novel weighting model to fold the bipartite graph into a topic only graph which we call semantic relatedness graph. A semantic relatedness graph is a WordNet liked net-work, in which the listed words have specific semantic meaning since we have identi-fied various entries of Wikipedia as concepts. Furthermore, the weights on edges con-necting words express the degree of semantic relatedness between words. We also pro-pose a mechanism to rate the nodes in a semantic graph and provide a ranking list for those users who are used to traditional recommendation systems on the basis of the proposed semantic word expansion system. A novel way to conduct image search is also suggested and demonstrated. Experiment results show that our system is flexible and useful. To the best of our knowledge, we are the first team to explicitly use Wikipedia’s social network to compute semantic relatedness and to apply the obtained semantic graphs for conducting image search.

參考文獻


Analysis of a Very Large AltaVista Query Log. Technical Report SRC 1998-014, Digital Systems Research Center, 1998.
[3] Peter J. Carrington, John Scott, and Stanley Wasserman (editors). Models and Methods in Social Network Analysis. Cambridge University Press, 2005.
[6] Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), Vol. 46, Issue 5, pp. 604-632, September, 1999.
[12] Christiane Fellbaum (editor). WordNet: An Electronic Lexical Database. The MIT Press, Cambridge, MA, 1998.
[13] Alexander Budanitsky and Graeme Hirst. Evaluating WordNet-based measure of lexical semantic relatedness. Computational Linguistics, Vol. 32, No. 1, pp. 13-47, March, 2006.

延伸閱讀