基於人力銀行之台灣地區薪資預測模型

本文的研究目的在於建構一個薪資預測模型，在此特別針對資訊軟體系統類相關職缺。此薪資預測模型可作為求職者與企業方的參考依據，根據結構化變數，包括個人資料與職位相關技能等等，以及工作內容的文字描述，可以讓他們了解該職位的大略薪資，減少雙方對於薪資的歧見。同時，從迴歸模型輸出的係數也可以知道各種變數所反映的市場價值，例如熟悉某項工作技能會對於薪資水準有甚麼樣的影響，提供求職者自我精進的方向與參考。本研究從資料的探索性分析開始，了解各個變數的基本特徵，並嘗試整合結構化變數(職位需求的條件等等)以及非結構化的變數(工作內容的文字描述)，藉由許多的機器學習演算法建立薪資預測模型。另外，也嘗試使用詞向量轉換的神經網路模型，針對工作內容的文字描述建立薪資預測模型，其評估結果並不亞於使用結構化變數的薪資預測模型，這顯示了中文的自然語言處理，應用於網路人力銀行資料集的薪資預測模型之建構是可行的。

關鍵字

薪資預測；機器學習；卷積神經網路；自然語言處理； Word2Vec ；詞向量；高維數據

並列摘要

The purpose of this thesis is to construct a salary prediction model, especially for information software system related positions using web-recruitment data. Based on structured data, including personal information and job-related skills, as well as unstructured text describing job content, the established models can be used as a reference for job seekers and companies to estimate the salary level of a certain job. Meanwhile, the variable coefficients from the regression models provide information about the market value reflected by those variables. The identified high-pay skills and expertise could guide the job seekers in which areas they can improve themselves. This research starts with an exploratory data analysis which helps us to understand the basic characteristics of each variable. Next, we apply various machine learning algorithms to the integrated structured and unstructured data to establish salary prediction models. The results show Random Forest, Ridge and Lasso perform well on the sparse high-dimension dataset. After that, we adopt a natural language processing approach by employing a convolutional neural network on the word vector data transformed from job content text. The result shows that the created salary prediction model is on a par with the models constructed using integrated structured and unstructured data. This endorses natural language processing as a viable approach to construct salary prediction models using online recruitment data.

並列關鍵字

Salary prediction ； Machine learning ； Convolutional neural network ； Natural language processing ； Word2Vec ； Word vector ； High dimension data

參考文獻

[1] 104人力銀行，AI大浪捲動企業搶才職缺是5年前的3.2倍，上網日期2020年06月20日，檢自：https://corp.104.com.tw/archive/files/news/20200121.pdf

Google Scholar

[2] 104人力銀行，上網日期2020年06月20日，檢自：https://www.104.com.tw/jobs/main/https://www.cnbc.com/2019/12/30/5-hig

Google Scholar

[3] Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185.

Google Scholar

[4] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Google Scholar

[5] Breiman, L., J. Friedman, R. Olshen, and C. Stone, (1984). Classification and Regression Trees. Belmont, California : Wadsworth International Group.

Google Scholar

國際替代計量

基於人力銀行之台灣地區薪資預測模型

全文下載

主題瀏覽