從微部落格語料中探勘部落客的詞彙意象__國立清華大學博碩士論文全文影像系統

帳號：guest(18.116.35.5) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	楊庭豪
作者(外文):	Yang, Ting-Hao
論文名稱(中文):	從微部落格語料中探勘部落客的詞彙意象
論文名稱(外文):	Mining Blogger's Glossary Impressions from a Micro Blog Corpus
指導教授(中文):	蘇豐文
指導教授(外文):	Soo, Von-Wun
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	9662585
出版年(民國):	98
畢業學年度:	97
語文別:	英文
論文頁數:	35
中文關鍵詞:	機器學習、隨機漫步、自然語言處理
外文關鍵詞:	Machine Learning、Random Walk、Natural Language Processing
相關次數:	推薦:0 點閱:131 評分: 下載:2 收藏:0

微部落格(Micro Blog)為近一兩年來發展出的一種新型態的部落格，主要想法是用來讓使用者即時發表生活中發生的事件與心情。與傳統部落格最大不同之處在於微部落格是以短篇的文章形式為主，通常一篇文章只包含幾句話。也因為是新發展的一種網路服務，所以目前對微部落格的相關研究仍處於較為缺乏的狀態。
在微部落格的文章中存在著作者加註的文章分類標籤以及表情符號，在本篇論文中提出了利用這些資訊以及文章語意以從微部落格中找出作者對特定詞彙的情緒印象的方法。首先在剖析微部落格的文章之後，從中擷取出文章包含的語意概念以及表情符號等資訊。在擷取語義概念的部分，利用了詞性以及句子中詞彙相依性擷取詞彙包含的語意概念。並配合潛在語意分析(Latent Semantic Analysis, LSA)找出具有相似語意概念和情緒概念的文章，而將微部落格的文章表示成網路形式。再透過隨機漫步(Random Walk)模型來模擬文章之間情緒印象相互影響的情況，最後透過含有欲判斷詞彙的文章節點來計算該詞彙代表的印象的機率。
在實驗部分，實驗的語料取自"撲浪(Plurk)"這個微部落格網站,我們收集了同一作者數個月的文章，針對其中人和事物的詞彙作情緒印象的分類與評估。我們將詞彙分成正面、負面以及中性這三類。我們總共挑出了約100個詞彙，其準確度為69%。

Micro blog is a new type of service of blog over the last two years. The purpose of micro blog is providing a platform which user can share their life or mood in real-time. Unlike the traditional blog, the article of micro blog is short. They usually only contain a few words. Because micro blog is a new service in last two year, the researches about micro blog are lacking.
There is some attached information annotated by blogger in micro blog, for example, the label of article and emotion icons. In this paper, we propose a method to use them and semantic information to determine what the impression of the specific term the blogger has. First, we parse the corpus and extract the semantic concepts contained in the article. We use the POS tags and dependence relations between terms to extract semantic structure. After extracting, we use Latent Semantic Analysis(LSA) to find which articles have similar semantic concept. We assume the articles with similar semantic concepts or shared the same attached information may have relation of similar impression. We represent the article of micro blog as nodes and these relations as edge to construct a network of micro blog. When the network is complete, we use a random walk model to describe the process of impression-transmitting between articles.
In experimental part, we collect corpus from a micro blog website Plurk. We trace a blogger for a few months, and we classify terms about person and object into positive, negative and neutral. We select about 100 terms for experiment. The precision of experiment is 69%.

中文摘要
Abstract
致謝辭
Chapter 1 Introduction---------------------------------1
1.1 Introduction of Micro blog-------------------------1
1.2 Research motivation and objective------------------2
1.3 The organization of the thesis---------------------4
Chapter 2 Related works--------------------------------5
2.1 Text analysis--------------------------------------5
2.2 Information analysis on the text content-----------6
Chapter 3 Construction network for a micro blog-------8
3.1 Parsing corpus-------------------------------------9
3.2 Extraction of concepts-----------------------------10
3.3 Building document-concept matrix-------------------11
3.3.1 Introduction of Latent Semantic Analysis---------11
3.3.2 Build a document-concept matrix------------------13
3.4 Computing document similarity-------------------14
3.5 Construction the network of a micro blog-----------14
Chapter 4 Use random walk for finding probabilities of impressions--------------------------------------------17
4.1 A random walk model--------------------------------17
4.2 Random walk on a micro blog network----------------19
Chapter 5 Evaluation-----------------------------------21
5.1 Introduction of dataset----------------------------21
5.2 Evaluation standards-------------------------------21
5.3 Experiment-----------------------------------------23
5.3.1 Setup for the experiment-------------------------23
5.3.2 Experimental results-----------------------------23
5.4 Discussion-----------------------------------------25
Chapter 6 Conclusions and future work------------------27
Reference----------------------------------------------29
Appendix-----------------------------------------------32
Appendix a — Chosen terms in experiment---------------32

[1] Chinese Knowledge and Information Processing (CKIP) Chinese Parser: http://ckipsvr.iis.sinica.edu.tw/
[2] Andrea Esuli and Fabrizio Sebastiani. (2007). Random-Walk Models of Term Semantics: An Application to Opinion-Related Properties. Proceedings of LTC-07, the 3rd Language & Technology Conference, 221-225.
[3] Alastair J. Gill, Darren Gergle, Robert M. French, and Jon Oberlander. (2008). Emotion Rating from Short Blog Texts. Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, 1121-1124.
[4] Alastair J. Gill, Robert M. French, Darren Gergle, and Jon Oberlander. (2008). The Language of Emotion in Short Blog Texts. Proceedings of the ACM 2008 Conference on Computer Supported Cooperative Work, 299-302.
[5] Gilly Leshed, Joseph Jofish Kaye. (2006). Understanding How Bloggers Feel: Recognizing Affect in Blog Posts. Proceedings of CHI '06 Extended Abstracts on Human Factors in Computing Systems, 1019-1024.
[6] Gilad Mishne. (2005). Experiments with Mood Classification in Blog Posts. Proceedings of 1st Workshop on Stylistic Analysis of Text for Information Access, 53-60.
[7] Joseph Rudnick and George Gaspari. (2004). Elements of the Random Walk: An introduction for Advanced Students and Researchers. Cambridge University Press.
[8] Kevin Hsin-Yih Lin, Changhua Yang and Hsin-Hsi Chen. (2007). What Emotions Do News Articles Trigger in Their Readers. Proceedings of 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 733-734.
[9] Lun-Wei Ku, I-Chien Liu, Chia-Ying Lee, Kuan-hua Chen and Hsin-Hsi Chen. (2008). Sentence-Level Opinion Analysis by CopeOpi in NTCIR-7. Proceedings of NTCIR-7 Workshop Meeting, 260-267.
[10] Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, and Christopher D. Manning. (2009). Discriminative Reordering with Chinese Grammatical Relations Features. Proceedings of NAACL 2009 Third Workshop on Syntax and Structure in Statistical Translation, 51-59.
[11] Robert E. Thayer. (1989). The biopsychology of Mood and Arousal. Oxford University Press US.
[12] S Deerwester et al. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 391-407.
[13] Shady Shehata, Fakhri Karray and Mohamed Kamel. (2006). Enhancing Text Retrieval Performance using Conceptual Ontological Graph. Proceedings of Sixth IEEE International Conference on Data Mining, 39-44.
[14] Shady Shehata, Fakhri Karray, and Mohamed Kamel. (2007). A Concept-based Model for Enhancing Text Categorization. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 629-637.
[15] Taher H. Haveliwala. (2003). Topic-sensitive PageRank: A context-sensitive ranking algorithm for Web search. IEEE Transactions on Knowledge and Data Engineering, Vol 15, issue 4, 784-796.
[16] 楊昌樺,高虹安,陳信希. (2007). 以部落格語料進行情緒趨勢分析. 第十九屆自然語言與語音處理研討會論文集, 205-218.

電子全文

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文