人格特質是影響人類行為的重要因素之一,因此自動預測人格特質是相當俱有潛力的研究課題。近年來,已有許多學者投入相關研究。然而,主要的研究都集中在處理英文文本的部分,鮮少有針對中文文本的人格特質預測。中文文本和英文文本在詞(word)與詞的連接上有很大的不同:英文的詞與詞之間會有空白分隔,而中文沒有。這使得中文文本在分詞上比英文來得困難,更不容易分析。 因此,我們在本篇論文嘗試透過中文文本來分類一個人的人格特質。首先,我們收集222 位使用中文的臉書使用者的塗鴉牆貼文以及其人格特質分數。接著,應用結巴中文分詞來完成分詞任務,以及使用支持向量機作為分類人格特質的學習演算法。 實驗的結果顯示,在中文分詞的幫助之下預測精確度和召回率都有大幅的改善。而同時考慮文本特徵及臉書朋友數可達到最佳的表現。此外,我們發現外向的人比起內向的人來說,傾向於發表較長或較多的貼文且會頻繁使用常見的字。這暗示外向的人較喜歡在臉書上和其他人分享自己的心情或生活瑣事。
Automatically recognizing personality is a promising subject as a way to infer a person'sbehaviors. Many studies have been performed in recent years. However, very few of them are focus on predicting personality from Chinese texts. Chinese texts are very different from English texts where words are separated by the spaces. A Chinese sentence consists of a sequence of characters with no space between them. But a character is not a meaningful unit, a word is. This makes it more dicult to analyze Chinese texts since the boundaries of words are not obvious. In this thesis, we attempt to classify the personality traits from Chinese texts. We collected a dataset with posts and personality scores of the 222 Facebook users who use Chinese as their main written language. Then, the Jieba Chinese text segmentation was employed to accomplish the text segmentation task, and SVM was used as a learning algorithm for personality classication. Experimental results show that the performance in precision and recall gain much improvement with the help of text segmentation and considering both the text and friend features yields the best performance. Moreover, we nd that extraverts seem to write more sentences and use more common words than introverts do. This indicates that the extraverts are more willing to share their mood and life with others than the introverts.