  • 學位論文


Sentiment analysis for patient-authored journal text

指導教授 : 柯士文


本研究將情感分析(sentiment analysis, SA)技術用於醫療社群上,其中文字都是病患自我撰寫的文本(patient-authored text, PAT),透過分析病患在醫療社群上的分享,能進一步了情緒方面的寶貴資料。現行情感分析的研究中,大部分是以Twitter、影評、購物評論…等為主,但這些過去的經驗仍是有些部分可以套用於醫療社群領域,但仍有些特定的狀況,例如提到藥物、病徵或疾病。 資料來源選擇國外知名醫療社群網站www.patientslikeme.com。本研究除了探討過去SA的標準技術是否適用於病患自撰文本外,並以二類別(正面、負面)和三類別(正面、中性、負面)呈現分類結果。 本研究透過兩種方法分析情緒:自然語言模型(pattern-based)以及機械式學習(machine learning),前者採用Adv Verb Combine&Adv Adj Combine(AVC&AAC)和Adv Verb Adj Combine – SentiWordNet(AVAC-SWN)分類情緒,利用規則的方法配對文章分數。後者是SA中常被採用的方法,本實驗利用unigram作為特徵和frequency作為權重搭配作為baseline,本實驗提出semantic weighting 來修改文字的情感權重,將pattern-based產生情緒分數作權重依據訓練分類器。 實驗結果顯示,不論是二類別或三類別的分類結果,AVC&AAC是優於AVAC-SWN的,原因是AVAC-SWN的pattern較適合商品評論的情緒分析,不適合PAT文本。SVM方面,我們將資料分為三種,全部文本、醫療相關心情文本與單純抒發情緒的文本。結果顯示套用semantic weighting分類全部文本和單純抒發情緒的文本是有更佳的表現,但在醫療相關心情文本在semantic weighting 的情感加權下,SVM 的分類效果則是中性與負面情緒分類效果上升,但在正面情緒分類效果下降,整體的分類表現則些許下降;而醫療相關心情文本中,包含50%為負面情緒的文本,這說明帶有討論病徵的文本情緒表現較複雜或不明顯,在未來工作裡我們將持續研究提及病徵與情感表現得相關性。


This research use Sentiment Analysis (SA) technology to the medical community, where the text is patient-authored text, PAT. Analysis of patients in the medical community activities, we can get valuable information on the emotional condition. Sentiment analysis of the current research, mostly to social networking sites, movie reviews, product comments ... etc. In previous studies, some technologies can be applied to the domain of medical community, but still some different situations, such as referred to of drugs, symptoms or disease. Our data source selection famous foreign medical community website "www.patientslikeme.com".This study explores the past of Standards and Technology SA is available for patient-authored text, and the classification results with two labels (positive, negative) and three labels (positive, neutral, negative) rendering. In this study, the two methods sentiment: natural language model (pattern-based) and machine learning (support vector machine, SVM), Pattern-based approach uses Adv Verb Combine & Adv Adj Combine (AVC & AAC) and Adv Verb Adj Combine - SentiWordNet (AVAC-SWN) to classification texts by the rules. Machine learning algorithms: SVM has been frequently used in the SA. In this study, we proposed Semantic Weighting method to modify the text of semantic weightage, using the Pattern-based generating sentiment score to change weightage. Finally we compare with the baseline of SVM which is unigram as features and frequency as weightage. The results show that the results of two labels classification or three labels, AVC & AAC is better than the AVAC-SWN. The reason is that, AVAC-SWN is more suitable for product reviews and not for PAT text. In SVM, we will be divided into three types of data, 1) all text, 2) medical-related text and 3) express simple sentiment text. The results showed that applying Semantic Weighting to classify all text and express simple sentiment text is better than baseline performance. The medical-related text overall result is ineffective, neutral and negative sentiment are raised but positive classification effect is reduced. This type of data comprising 50% negative label sentiment text, this situation illustrates the text discuss medical-related is more complex or obscure. In the future we will keep working on influence between medical-related text and sentiment behaves.


ALPAYDIN, E. 2004. Introduction to Machine Learning (Adaptive Computation and Machine Learning), The MIT Press.
BAI, X. 2011. Predicting consumer sentiments from online text. Decision Support Systems, 50, 732-742.
CHANG, C.-C. & LIN, C.-J. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1--27:27.
DAS, A., BANDYAOPADHYAY, S. & GAMB CK, B. 2012. The 5W Structure for Sentiment Summarization-Visualization-Tracking. Computational Linguistics and Intelligent Text Processing Lecture Notes in Computer Science, 7181, pp 540-555.
GO, A., BHAYANI, R. & HUANG, L. 2009. Twitter Sentiment Classification using Distant Supervision. Processing, 1--6.
