透過您的圖書館登入
IP:3.133.149.168
  • 學位論文

中文情緒詞庫的建造與標記

Affective Lexicon in Chinese – Construction and Annotation

指導教授 : 謝舒凱

摘要


情緒詞表為情緒偵測研究的基礎資源,考量現有開放性中文詞表大都以情緒指稱詞(affect-denoting)為主,關於情緒示意詞(affect-signaling)的收錄較為缺乏,然而從認知語意以及語用學的角度而言,情緒示意詞卻在情緒表達的語言使用上扮演極為關鍵的角色,情緒韻律(semantic prosody)說明看似中性的詞,其實隱涵了正負偏向的關聯;而情緒與語言的對應往往跨越了字詞的邊界,詞組塊(chunk)也能表達情緒,而非固定的一詞對照一情緒。因此本研究將現有中文情緒指稱詞詞表整合分類,並且人工收集與標記中文情緒示意詞,作為中文情緒偵測研究的基礎資源,同時也證明功能語法在文本中情緒辨識的功用。本研究分為兩階段,第一階段為人工收集、標記與分類,第二階段為詞表的評測與應用。第一階段將情緒指稱詞從現有詞表整合且分類,分為高興、難過、害怕、生氣、驚訝五類,再依據該詞指稱的情緒強度與持續時間細分至情緒(emotion)、心情(mood)、脾氣(temperament)三類之中。另一方面,情緒示意詞的收集則從兩個角度的語料庫進行: 作者分類的情緒文章(PTT心情版900篇),讀者分類的情緒文章(Yahoo心情新聞1000篇),從中進行詞組塊的人工標記與分類。此外,也收錄常見的情緒用語,如:感嘆詞、表情符號、髒話與辱罵詞等。第二階段評測分為兩部分,第一步估算每個情緒示意詞的情緒預測能力,該數值為文本語料庫中每次該詞出現後接十個詞的情緒分數平均值。第二部為檢驗該預測能力,將情緒示意詞抽取正負各十組,由情緒詞加總的簡易計算法,以人工評分的情緒文本為標準,比較有情緒示意詞的情況,在準確率上的提升:正向詞組平均提升4.78%,負向詞組18.18%。最後,應用方面,使用於Magistry et al (2015)的中文短文情緒偵測機器學習研究,F1分數提升近2%。

並列摘要


Affective lexicon is the fundamental resource for sentiment detection. However, most existing Chinese affective lexicon is mainly about affect-denoting words and lacks of affect-signaling words. From the aspect of cognitive semantics and pragmatics, affect-signaling words play a critical role in emotion expression of language use. Semantic prosody explains neutral words would have association with positive or negative polarity, while the functional theory shows the connection between words and meaning is not one-on-one, neither is the connection between words and emotion. The corresponding of emotion and language expression might beyond the boundaries of words: chunks. Therefore, the research aims to collect annotate affect-signaling words and organize it with affect-denoting words into a multi-dimensional affective lexicon in Chinese. The function of the result is not only for the open resource for sentiment analysis, but also as an evidence of how functional grammar works in sentiment detection in texts. Two phases of process involve in the research. First is manual collection, annotation, and categorization of affective lexicon. Second is the evaluation and application. In first stage, affect-denoting words are categorized into 5 categories (happy, sad, scared, angry, and surprised) and 3 levels (emotion, mood, temperament), according to the strength and duration. On the other hand, affect-signaling words are collected and annotated from two sources of database: author-oriented emotional articles (from BBS) and reader-oriented emotional news (from yahoo news). Besides, the common emotion expression words are collected as well, including interjections, emoticons, and expletives. In phase two, the emotion-prediction ability of each affect-signal words is calculated by the mean scores of emotion value in the following ten words. To measure the result, the random sample of affect-signaling words are added in the NTUSD as the affective lexicon for sentiment analysis to compare the accuracy with/without affect-signaling words. The promotion of the accuracy in positive affect-signaling words is 4.78% while the negative one is 18.18%. In the application, the whole affective lexicon is applied on an unsupervised machine leaning approach to sentiment detection of micro-blog data in Chinese (Magistry et al, 2015), and yields the promising result of nearly 2% improvement in the original F1-score.

參考文獻


卓淑玲, 陳學志, & 鄭昭明. (2013). 台灣地區華人情緒與相關心理生理資料庫─ 中文情緒詞常模研究. Chinese Journal of Psychology, 55(4), 493-523.
黃金蘭, 林以正, 謝亦泰, & 程威銓. (2012). 中文版 [語文探索與字詞計算] 詞典之建立. Chinese Journal of Psychology, 54(2), 185-201.
Lee, S. Y. M., Chen, Y., Li, S., & Huang, C. R. (2010a). Emotion Cause Events: Corpus Construction and Analysis. In LREC.
Cambria and A. Hussain. (2012). Sentic computing. Springer.
Baider, F & Cislaru. G. eds. (2014). Linguistic approaches to emotion in context. John Benjamins Company.

被引用紀錄


王雅詩(2017)。基於詞性組合的意見字典擴增方法之研究〔碩士論文,淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2017.00608
謝舒凱、曾昱翔(2019)。深度詞庫:邁向知識導向的人工智慧基礎中華心理學刊61(3),231-247。https://doi.org/10.6129/CJP.201909_61(3).0004

延伸閱讀