基於深度學習方法根據使用者生成資料進行個性評估

在過往的許多研究中已經證明人的個性跟人的生活、行為和喜好有非常大的關聯。根據這些關聯，知道一個人的個性便有助於企業進行人力資源管理，幫助企業找到他們的目標客群，以及幫助其他任何需要對人有初步了解的任務。為了有效率的偵測一個人的個性，目前已經有很多方法利用使用者生成的資料來進行自動化的個性推測。隨著人工智慧的快速發展，許多前人的研究中開始應用深度學習方法從文章中萃取出複雜的語意特徵來幫助他們建立更強大的分類模型。然而要訓練一個深度學習模型通常需要非常大量的資料，在這個領域中有標記的資料卻越來越難取得。因此在使用深度學習方法時就必須留意資料量不足的問題。在這個領域中，長文章也是一個需要特別處理的問題，因為使用者生成資料有時會是一篇很長的文章，但是某些深度學習架構像是遞歸神經網路(RNN)並無法記憶這樣過長的內容，所以就可能會跑出不理想的結果。我們的研究中，我們提出一個綜合了深度學習、傳統預先定義好的特徵以及極限梯度提升分類器(XGBoost)的模型架構。我們利用遷移學習的技巧來處理對深度學習來說資料量不足的問題。我們使用了兩種不同挑選重要句子的方式來增加我們的資料量並且解決長文章的問題。最後的結果顯示我們的模型中的每一個部分都有助於提升模型的表現。我們的方法也比現有的技術可得到更高的準確率。

關鍵字

個性；深度學習；遷移學習；少量資料集；使用者生成資料；文字探勘

並列摘要

Human personality has been proved to be highly correlated to individual’s life, behaviors, and preferences. Because of these relationships, knowing a people’s personalities is helpful for firms’ effective human resource management, finding firms’ target customers, and other tasks that can be supported with users’ profiles. To efficiently detect a person’s personality traits, several methods have been proposed to infer the personality automatically by user-generated content (UGC). With the rapid development of AI, prior studies started to exploit the deep learning approach to discover latent and complex linguistic features and to develop a more effective classification model. However, training a deep learning model usually needs a very large set of training data, but in this specific task, labeled data are hard to obtain. Therefore, the use of deep learning methods for personality prediction will need to address the limited training data problem. Another problem in this task is that sometimes UGC data will be long documents while some deep learning models such as Recurrent Neural Networks cannot memorize such huge context. In this work, we propose a hybrid model structure containing deep learning, traditional hand-crafted features, and XGBoost classifier. We employ transfer learning to address the insufficient training data problem for deep learning models. We propose two sentence selection schemes to increase our training data set and, at the same time, to address the long document problem. Our empirical evaluation results show that each part of our proposed method helps to improve the prediction effectiveness and outperforms our benchmark method.

並列關鍵字

Personality ； Deep learning ； Transfer learning ； Small dataset ； User-generated content ； Text mining

參考文獻

Barrick, M. R. and Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1):1–26.

Google Scholar

Barrick, M. R., Mount, M. K., and Strauss, J. P. (1993). Conscientiousness and performance of sales representatives: Test of the mediating effects of goal setting. Journal of Applied Psychology, 78(5):715–722.

Google Scholar

Bhat, S. and Reddy, S. K. (1998). Symbolic and functional positioning of brands. Journal of Consumer Marketing, 15(1):32–43.

Google Scholar

Bleidorn, W. and Hopwood, C. J. (2019). Using machine learning to advance personality assessment and theory. Personality and Social Psychology Review, 23(2):190–203. PMID: 29792115.

Google Scholar

Buettner, R. (2017). Predicting user behavior in electronic markets based on personality-mining in large online social networks. Electronic Markets, 27(3):247–265.

Google Scholar

國際替代計量

基於深度學習方法根據使用者生成資料進行個性評估

全文下載

主題瀏覽