從微部落格語料中探勘部落客的詞彙意象

微部落格(Micro Blog)為近一兩年來發展出的一種新型態的部落格，主要想法是用來讓使用者即時發表生活中發生的事件與心情。與傳統部落格最大不同之處在於微部落格是以短篇的文章形式為主，通常一篇文章只包含幾句話。也因為是新發展的一種網路服務，所以目前對微部落格的相關研究仍處於較為缺乏的狀態。在微部落格的文章中存在著作者加註的文章分類標籤以及表情符號，在本篇論文中提出了利用這些資訊以及文章語意以從微部落格中找出作者對特定詞彙的情緒印象的方法。首先在剖析微部落格的文章之後，從中擷取出文章包含的語意概念以及表情符號等資訊。在擷取語義概念的部分，利用了詞性以及句子中詞彙相依性擷取詞彙包含的語意概念。並配合潛在語意分析(Latent Semantic Analysis, LSA)找出具有相似語意概念和情緒概念的文章，而將微部落格的文章表示成網路形式。再透過隨機漫步(Random Walk)模型來模擬文章之間情緒印象相互影響的情況，最後透過含有欲判斷詞彙的文章節點來計算該詞彙代表的印象的機率。在實驗部分，實驗的語料取自"撲浪(Plurk)"這個微部落格網站,我們收集了同一作者數個月的文章，針對其中人和事物的詞彙作情緒印象的分類與評估。我們將詞彙分成正面、負面以及中性這三類。我們總共挑出了約100個詞彙，其準確度為69%。

關鍵字

機器學習；隨機漫步；自然語言處理

並列摘要

Micro blog is a new type of service of blog over the last two years. The purpose of micro blog is providing a platform which user can share their life or mood in real-time. Unlike the traditional blog, the article of micro blog is short. They usually only contain a few words. Because micro blog is a new service in last two year, the researches about micro blog are lacking. There is some attached information annotated by blogger in micro blog, for example, the label of article and emotion icons. In this paper, we propose a method to use them and semantic information to determine what the impression of the specific term the blogger has. First, we parse the corpus and extract the semantic concepts contained in the article. We use the POS tags and dependence relations between terms to extract semantic structure. After extracting, we use Latent Semantic Analysis(LSA) to find which articles have similar semantic concept. We assume the articles with similar semantic concepts or shared the same attached information may have relation of similar impression. We represent the article of micro blog as nodes and these relations as edge to construct a network of micro blog. When the network is complete, we use a random walk model to describe the process of impression-transmitting between articles. In experimental part, we collect corpus from a micro blog website Plurk. We trace a blogger for a few months, and we classify terms about person and object into positive, negative and neutral. We select about 100 terms for experiment. The precision of experiment is 69%.

並列關鍵字

Machine Learning ； Random Walk ； Natural Language Processing

參考文獻

[1] Chinese Knowledge and Information Processing (CKIP) Chinese Parser: http://ckipsvr.iis.sinica.edu.tw/

[2] Andrea Esuli and Fabrizio Sebastiani. (2007). Random-Walk Models of Term Semantics: An Application to Opinion-Related Properties. Proceedings of LTC-07, the 3rd Language & Technology Conference, 221-225.

[5] Gilly Leshed, Joseph Jofish Kaye. (2006). Understanding How Bloggers Feel: Recognizing Affect in Blog Posts. Proceedings of CHI '06 Extended Abstracts on Human Factors in Computing Systems, 1019-1024.

[7] Joseph Rudnick and George Gaspari. (2004). Elements of the Random Walk: An introduction for Advanced Students and Researchers. Cambridge University Press.

[10] Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning. (2009). Discriminative Reordering with Chinese Grammatical Relations Features. Proceedings of NAACL 2009 Third Workshop on Syntax and Structure in Statistical Translation, 51-59.

國際替代計量

從微部落格語料中探勘部落客的詞彙意象

主題瀏覽