透過您的圖書館登入
IP:3.137.191.94
  • 學位論文

基於深度學習方法根據使用者生成資料進行個性評估

A Deep Learning Based Approach for Personality Detection from User Generated Content

指導教授 : 魏志平

摘要


在過往的許多研究中已經證明人的個性跟人的生活、行為和喜好有 非常大的關聯。根據這些關聯,知道一個人的個性便有助於企業進行 人力資源管理,幫助企業找到他們的目標客群,以及幫助其他任何需 要對人有初步了解的任務。為了有效率的偵測一個人的個性,目前已 經有很多方法利用使用者生成的資料來進行自動化的個性推測。隨著 人工智慧的快速發展,許多前人的研究中開始應用深度學習方法從文 章中萃取出複雜的語意特徵來幫助他們建立更強大的分類模型。然而 要訓練一個深度學習模型通常需要非常大量的資料,在這個領域中有 標記的資料卻越來越難取得。因此在使用深度學習方法時就必須留意 資料量不足的問題。在這個領域中,長文章也是一個需要特別處理的 問題,因為使用者生成資料有時會是一篇很長的文章,但是某些深度 學習架構像是遞歸神經網路(RNN)並無法記憶這樣過長的內容,所以 就可能會跑出不理想的結果。 我們的研究中,我們提出一個綜合了深 度學習、傳統預先定義好的特徵以及極限梯度提升分類器(XGBoost)的 模型架構。我們利用遷移學習的技巧來處理對深度學習來說資料量不 足的問題。我們使用了兩種不同挑選重要句子的方式來增加我們的資 料量並且解決長文章的問題。最後的結果顯示我們的模型中的每一個 部分都有助於提升模型的表現。我們的方法也比現有的技術可得到更 高的準確率。

並列摘要


Human personality has been proved to be highly correlated to individual’s life, behaviors, and preferences. Because of these relationships, knowing a people’s personalities is helpful for firms’ effective human resource management, finding firms’ target customers, and other tasks that can be supported with users’ profiles. To efficiently detect a person’s personality traits, several methods have been proposed to infer the personality automatically by user-generated content (UGC). With the rapid development of AI, prior studies started to exploit the deep learning approach to discover latent and complex linguistic features and to develop a more effective classification model. However, training a deep learning model usually needs a very large set of training data, but in this specific task, labeled data are hard to obtain. Therefore, the use of deep learning methods for personality prediction will need to address the limited training data problem. Another problem in this task is that sometimes UGC data will be long documents while some deep learning models such as Recurrent Neural Networks cannot memorize such huge context. In this work, we propose a hybrid model structure containing deep learning, traditional hand-crafted features, and XGBoost classifier. We employ transfer learning to address the insufficient training data problem for deep learning models. We propose two sentence selection schemes to increase our training data set and, at the same time, to address the long document problem. Our empirical evaluation results show that each part of our proposed method helps to improve the prediction effectiveness and outperforms our benchmark method.

參考文獻


Barrick, M. R. and Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1):1–26.
Barrick, M. R., Mount, M. K., and Strauss, J. P. (1993). Conscientiousness and performance of sales representatives: Test of the mediating effects of goal setting. Journal of Applied Psychology, 78(5):715–722.
Bhat, S. and Reddy, S. K. (1998). Symbolic and functional positioning of brands. Journal of Consumer Marketing, 15(1):32–43.
Bleidorn, W. and Hopwood, C. J. (2019). Using machine learning to advance personality assessment and theory. Personality and Social Psychology Review, 23(2):190–203. PMID: 29792115.
Buettner, R. (2017). Predicting user behavior in electronic markets based on personality-mining in large online social networks. Electronic Markets, 27(3):247–265.

延伸閱讀