透過您的圖書館登入
IP:18.219.63.90
  • 學位論文

網路日誌空間情緒分析方法之研究

A Study on Emotion Analysis from Blogosphere

指導教授 : 陳信希

摘要


網際網路(WWW)是橫跨二十到二十一世紀最偉大的發明,發明初期WWW改變了人們吸收資訊的方式,近來則是改變著人們表達資訊的方式,例如Blog(網路日誌)、Wikipedia(維基百科全書)、影音分享(YouTube)等新興網站類型,皆包含了藉由許多使用者的參與及互動,使其內容豐富的成功服務或技術。而這些新興的網站統稱為社群媒體(Social Media),正是強調著其吸引社群使用者加入參與、互動的特質。網際網路大型入口網站所提供的服務通常囊括了各式社群媒體,在傳統媒體的呈現上也整合了包含社群要素的創意。 隨著網際網路的蓬勃發展,眾多網路日誌所形成的網路空間,提供大量具有時間標記的文本,為語言處理所需之豐富語料來源。目前學術界對於網路日誌的相關研究可區分為以下四類,分別是(一)從WWW延伸而來的社群網絡之連結分析,(二)針對網路日誌文本內容(content)及前後文(context)之資訊挖掘、檢索或統計、規則分析,(三)針對網路日誌參與者其發表觀點、意見、情緒之分析,以及(四)其他各項有關標記、使用者特質、社群效應之分析研究。網路日誌空間除了其文本內容是延續傳統資訊檢索與自然語處理領域中所關注的焦點外,網路日誌所包含的連結與標記更是成功地將其社群化的重要關鍵,因此也成為各項分析技術應用在網路日誌時所必須一併考量之重要因素。 本研究創新地使用雅虎奇摩部落格中帶有表情符號的文句作為分析的材料,以多達一年的資料量作為知識訓練材料,一個月的資料進行測試評估;除了設計方法抽取出情緒字典,證明字典有助於情緒分析外,並提出應用機器學習方法於情緒分析上,獲得出循序性模型更能成功辨識情緒之結論。在實作上,本研究在針對部落格文本集合其帶有的時間及情緒特性,進行語料的觀察與分析之後,進一步設計出一個情緒分析的系統架構,其中主要項目包括:從網路服務取得知識庫,以機率方法建立情緒字典,以機器學習為本的模式、運用文句特徵建立情緒分類知識、以情緒分類知識為核心建立各項應用。應用項目包括了對部落格文本的搜尋過濾、跨語言的文本情緒辨識、整合作者與讀者多重觀點的情緒辨識、整合文本與音樂多重媒體的情緒辨識,另外也實作了情緒的民意與趨勢分析,以及協助作者判斷讀者情緒的寫作系統。 本研究從網路日誌中帶有情緒符號標記的文句出發,探討人們的溝通行為擴展至網路空間後,如何將情緒表達的需求,反映在文字與情緒符號的使用上。並進一步以情緒符號的意涵作為文句表達情緒的分類依據,藉由各項情緒分類器的實驗數據,研究以情緒詞彙解釋人們在網路日誌中使用情緒符號的偏好與特徵,進而達成對網路空間人們情緒的解讀與分析。 有關網路日誌其文本與使用者情緒的研究,需要各項自然語言處理技術的創新發明與協助。儘管網路日誌服務提供了使用者許多有關發表內容的新技術、新創意,使用者主要仍是以語言的方式參與網路日誌創作,並藉由語言跟其他使用者互動、溝通。與其他使用者溝通時由於不是面對面的接觸,因此得用適當的語言模式來表達自己的情緒,這個模式可能會是沿用自傳統的溝通習慣,或是因網路興起而產生的規則(如網路俚語)。為了分析這些語言模式,在大量網路日誌語料的支援下,本研究在結合機率統計模型、機器學習、語言學的方法架構下提出分析方法及提出檢驗;期許在網路日誌蓬勃發展的同時,也能掌握下一波影響網際網路發展的關鍵技術。

關鍵字

部落格 情緒分析

並列摘要


With the rapid emergence of WWW innovations, people are continuously improving their ways of processing information. To fulfill the information needs of WWW users, blog sites, encyclopedia sites, video-sharing sites emerge as powerful value-added platforms. These sites are referred to as social media that integrates web users’ publications, communications, and interactions. People can easily share their creations and emotions through this platform. Among different forms of social media, blog is the most representative and widely spread by internet. The blog space is traditionally regarded as useful corpora which provide tremendous amount of materials for language processing tasks. This dissertation studies on emotion analysis using blog as the dataset. We innovatively use Yahoo! Kimo Blog posts that contain emoticons as the analyzing corpora. We have analyzed the corpora in a huge volume that spans a period of one year as the training dataset and the posts spanning a period of one month as the testing dataset. We consider collection of blog articles as training and testing datasets for emotion classification. For a classification task on blog, the emotion ground truths are those emoticons that are brought in by bloggers when they want to share their feelings, emotions, or moods to the blog community. The blog datasets are first used to construct an emotion lexicon by collocation test methods and we have shown how this lexicon can facilitate the emotion analysis. Those terms in emotion lexicon are therefore regarded as features for learning machine learning-based classifiers. We have improved the performance of emotion analysis by incorporating the sequential information. Finally, the learnt classification kernel has been applied on a multi-perspective integration (writers and readers) and a multi-perceptive integration (blog and music). Knowledge on blog metadata inclusive of textual units, time stamps, and named entities also help construct a census and trend survey module. Other applications include implementation on emotion filtering for blog texts, a cross-lingual adaption of emotion analysis, and an authoring tool for the writes to predict the readers’ emotions. Written text is one of the media by which people convey their emotions. But do bloggers always share the traceable emotions? If not, are the appearances of emotion icons totally random, or are there recurring patterns? These are the original questions which direct our research. By knowing how emotions conveyed by texts, it is possible to build a system to provide users with language usage recommendations to assist uses in expressing appropriate emotions. The analyzing system on emotions can be integrated in other research fileds in the future.

並列關鍵字

blog emotion analysis

參考文獻


[1] Christian Becker, Stefan Kopp, and Ipke Wachsmuth. Simulating the Emotion Dynamics of a Multimodal Conversational Agent. In ADS '04: Proceedings of Tutorial and Research Workshop on Affective Dialogue Systems, pp. 154-165, 2004.
[5] Christopher H. Brooks and Nancy Montanez. Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering. In WWW2006: Proceedings of the 15th international conference on World Wide Web, pp. 625-632, Edinburgh, Scotland, 2006.
[8] Echa Chang, Chu-Ren Huang, Sue-Jin Ker, and Changhua Yang. Induction of Classification from Lexicon Expansion :Assigning Domain Tags to WordNet Entries. In Proceedings of COLING-2002 Workshop on SEMANET, Taipei, 2002.
[9] Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning. MIT Press, 2006.
[10] F. Chaumartin. A knowledge-based system for headline sentiment tagging. In Proceedings of SemEval-2007, Prague, Czech Republic, June 2007.

延伸閱讀