以查詢句下的階層式概念網路建立之使用者興趣檔之研究

使用者興趣檔為近年日漸興起之網路探勘領域研究議題，在過去的研究中，使用者興趣檔的建置方法主要是從使用者的瀏覽或搜尋記錄中萃取能代表使用者興趣的字詞集合，再透過字詞間的相似度找出其他相關字詞。然而，過去以統計為基礎的建置方法需要大量的文件內容以找出該文件的關鍵字與相關字詞。在計算字詞相似度方面，也需要大量計算。再者，若以統計為基礎的方法計算相似度，進而計算興趣強度，還需考慮會受到字詞本身的統計特性強弱影響之問題。基於網路探勘領域的使用者興趣檔建置之背景與以及改良傳統方法建置之動機，本研究利用搜尋引擎回傳的網頁片段、中文語法結構以及詞語出現的區域建立一個階層式網路結構，根據詞語在此階層式網路中之距離計算概念相似度，進而計算詞語的興趣強度，以改善傳統以統計方式建置使用者興趣檔方法之問題，且能有效的提昇興趣檔的品質，以便未來用於個人化搜尋、推薦及其他個人化應用上。實驗結果顯示，本研究提出之方法所建置之使用者興趣檔的平均興趣強度高於傳統方法之使用者興趣檔。在執行時間方面，本研究在建置使用者興趣檔執行時間明顯少於傳統方法所需之時間。

關鍵字

使用者興趣檔；概念階層；概念相似度；中文詞組語法

並列摘要

The research regarding user interest files in the field of web mining is rising. In the past researches, the method to build up user interest files is mainly to extract a set of words by user's browsing or search records, and then find other related words through the similarity between the words. However, previous statistical-based methods do not only need a large amount of file content to discover the keywords and related words of the file but also require a lot of effort to calculate word similarity. It also has to spend many calculations to compute the similarity of the words. Furthermore, if a statistical-based method is used to calculate similarity, and then calculate the intensity of interest, it is necessary to consider the problem of being affected by the statistical characteristics of the word itself. Based on the background to construct user interest files for the web mining and tend to improve the traditional method, our research uses a web snippet, the structure of Chinese grammar, and areas where keywords appear, which returned by a search engine to build up hierarchical network structure. At the same time, we calculate the distance of words among concept similarity in the hierarchical network and then measure the interesting intensity of words to improve problems of traditional statistical method building user interest files, which can enhance the quality of interest files effectively, easy for personal searching, recommend, and other personalized applications in the future. The result presents that user interest files adopting the methods that we proposed have higher average interest intensity than user interest files using traditional ways. In terms of execution time, the execution time of construction user interest files in our study is significantly less than traditional methods.

並列關鍵字

user interest files ； hierarchical conceptual network ； concept similarity ； Chinese grammar of words

參考文獻

Agirre E.& Rigau G., (1995). A proposal for word sense disambiguation using conceptual distance, Proc. of International Conference Recent Advances in Natural Language Processing （RANLP）, Tzigov Chark,Bulgaria, pp.258-264.

Google Scholar

Brin, S. & Page L., (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine, Seventh International World-Wide Web Conference, April,

Google Scholar

Collins, A. M. & Loftus, E. F., (1975). A Spreading activation theory of semantic processing , Psychological Review, Vol.82, No.6, pp.407-428.

Google Scholar

Doyle, L. B., (1961). Semantic Road Maps for Literature Searchers, Journal of the ACM （JACM）, Vol.8, Iss.4, Oct.,

Google Scholar

Forsyth, R. S. & Rada, R., (1986). Machine Learning: applications in expert systems and information retrieval, Ellis Horwood,

Google Scholar

國際替代計量

以查詢句下的階層式概念網路建立之使用者興趣檔之研究

全文下載

主題瀏覽