運用維基百科進行個人微網誌內容主題分析

近年來微網誌的使用越來越普遍，使用者會透過微網誌文章與好友分享，包含使用者興趣、心情、資訊分享等。微網誌使用者所發表的文章所涵蓋的類別通常是使用者有興趣的主題，因此我們希望藉由探勘微網誌使用者的所發表的文章主題來找出使用者的興趣。本論文研究所提出的方法是先對一個微網誌使用者萃取出文章中的重要字詞，運用維基百科之分類網絡來查詢出字詞所涵蓋的類別概念，而探勘出使用者可能的興趣類別。在探勘過程中，對於維基百科中直接查詢不到的字詞，則透過線上連結維基百科尋找重定向字詞所涵蓋的類別概念。對於非維基百科字詞，我們則透過相關字詞的聚落分析結果，運用相同聚落的其他字詞來探勘出可能的類別概念。我們提出計算微網誌使用者的文章主題集中度之評估方法，實驗結果顯示：本論文系統所提出之使用者文章集中度的評估方法可達到很高的正確率，且本論文系統自動判定使用者的興趣類別與受試者所挑選的類別結果有一定程度的一致性。

關鍵字

微網誌；維基百科；文字探勘

並列摘要

In recent years, micro-blogging has been widely used by users. Micro-blog users usually share their interests, feelings, and information with their friends. The implicit topics covered in the micro-blog articles of a user usually show the user’ interests. Therefore, the goal of this study is to discover the implicit topics of micro-blog articles posted by micro-blog users to find users' interests. In this thesis, we first extract the important terms in a micro-blog article, and then Wikipedia is used to look up the corresponding categories of each term. For the terms which that can’t be found by Wikipedia directly, the Wikipedia online is linked to find the categories of their redirected terms. For each non-Wikipedia term, through the clustering analysis of related terms, the other terms in the same cluster with the non-Wikipedia term are used instead to get the corresponding categories. An evaluation method is proposed to measure the topic concentration degree of a micro-blog user. The results of experiments show that the proposed method can judge the topic concentration degree of micro-blog users with high precision. Moreover, the interest categories of micro-blog users discovered by the proposed method has high consistency with the results decided by the testers.

並列關鍵字

micro-blogging ； Wikipedia ； text mining

參考文獻

[16] F. Lin and W. W. Cohen, “The MultiRank Bootstrap Algorithm: Semi-Supervised Political Blog Classification and Ranking Using Semi-Supervised Link Classification,” in Proceedings of the 2nd International Conference on Weblogs and Social Media, 2009.

[12] X. Ni, X. Wu and Y. Yu , “Automatic Identification of Chinese Weblogger's Interests Based on Text Classification,” in proceedings of the 2006. IEEE/WIC/ACM International Conference on Web Intelligence

[17] C. Cortes and V. Vapnik. “Support-vector network,” Machine Learning, 20:273-297,1995.

[1] A. Java, X. Song, T. Finin and Belle Tseng, “Why We Twitter: Understanding Microblogging Usage and Communities,” in Proceedings of the 1st International Workshop on Social Network Mining and Analysis, SNAKDD, 2007.

Google Scholar

[2] C. Macdonald and I. Ounis, “Key Blog Distillation: Ranking Aggregates,” in Proceedings of the 16th ACM conference on Conference on Information and Knowledge Management, 2007.

Google Scholar

國際替代計量

運用維基百科進行個人微網誌內容主題分析

主題瀏覽