基於PTT電子佈告欄之憂鬱症偵測:以文字及時間資訊為特徵

憂鬱症是當前人類社會的重要疾病，然而許多人有憂鬱症的問題卻不自知。由於人們熱衷於將自己的日記或情緒發洩寫在網路上，利用文字來做憂鬱症的偵測應是一個值得研究的問題。而這篇論文的目標在於設計一個分類系統，能藉由一個人所寫的文章來判斷他是否為潛在的憂鬱症患者。為了解決這個問題，我們提出了一個使用監督式學習(supervised learning)方法的兩階段分類器。第一個階段先判斷目標是否為負面情緒，第二個階段再判斷目標是否為憂鬱症。所有的訓練資料皆來自PTT電子佈告欄，且我們以字詞的TF-IDF值作為基本的特徵。兩階段的交叉驗證準確度(cross validation accuracy)分別為96.17%及81.86%。此外，在第二階段中，我們還額外考慮了時間資訊，將一個時間性特徵定義為“時間-字詞”的配對，在這些時間性特徵加入後，得到2.65%的準確度提升。這表示，在不同的時間講同樣的字可能代表不同的傾向(憂鬱症或一般負面情緒)。我們也發現時間資訊在缺乏明顯字詞的情況下效果特別明顯，由於在真實世界中，許多憂鬱症患者並不會使用明顯的字詞，因此我們可以期待時間資訊在系統實際使用時所發揮的效果。為了檢視系統與真人判斷的一致性，我們進行了使用者測試，結果顯示系統在偵測重度憂鬱症上的表現比起偵測中度或輕度憂鬱要好。最後，本研究提出了一個解釋的方法，希望能在不失去準確率的情況下增進真人判斷的效率，並且以實際的使用者測試證實了它的效果。

關鍵字

憂鬱症；憂鬱症判斷；時間資訊；文章分類；解釋

並列摘要

Depression is now an important disease in human society, while many people have this problem without being aware of it. Since people like to post their diary or vent emotions on the web, detect depression by texts should be a worthwhile topic. The goal of this thesis is to design a classification system that determines whether a person is a potential candidate for depression given the texts written by the person. To solve this problem, we propose to use a “two-stage” classifier with supervised learning method. The first stage determines whether the target is negative-emotion or not, and if it is, the second stage further determines whether it is clinical depression or just ordinary sadness. All of our training data come from PTT bulletin board system, and TF-IDF values of words is used as the basic features for classification. The cross validation accuracy of the two stages are 96.17% and 81.86% respectively. In addition, for the second stage, we further consider time information, define a temporal feature to be a time-term pair, and result in 2.65% improvement when these temporal features are added. It shows that saying one word in different time may represent different tendencies (clinical depression or ordinary sadness). We also found that time information works especially when obvious terms are not available, since lots of depression people in real world do not speak obvious terms, we can expect the effect of time information when the system is in reality use. To see the consistency between system and human judgment, a user study has been conducted. It shows that our system performs better in detecting major depression than in detecting moderate or minor depression. In the end, we demonstrate an explanation way to improve the efficiency of human judgment without losing accuracy, and a user study proved the effect.

並列關鍵字

depression ； depression detection ； time information ； text classification ； explanation

參考文獻

[3] M. M. Ohayon and S. C. Hong. Prevalence of Major Depressive Disorder in the General Population of South Korea. Journal of Psychiatry research, 40, 30-6, 2006.

[4] D. G. Blazer, R. C. Kessler, K. A. McGonagle and M. S. Swartz. The Prevalence and Distribution of Major Depression in a National Community Sample: the National Comorbidity Survey. The American Journal of Psychiatry, 151, 979-986, 1994.

[11] I. Kononenko. Machine Learning for Medical Diagnosis: History, State of the Art and Perspective. Artificial Intelligence in Medicine, 23, 89-109, 2001.

[12] J. Sim and C. C. Wright. The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy, 85, 257-268, 2005.

[13] J.R. Landis and G. G. Koch. The Measurement of Observer Agreement for Categorical Data. Biometrics, 33, 159-174, 1988.

國際替代計量

基於PTT電子佈告欄之憂鬱症偵測:以文字及時間資訊為特徵

全文下載

主題瀏覽