應用集成學習分類模型預測銀行客戶之行銷成效

銀行業擁有龐大客戶基本資料與各種交易資料，若在產品行銷前就能以集成學習分類模型去預測哪些客戶能夠成功產品行銷，有了這些預測的資訊，銀行業便可事先瞭解與掌握產品預期的行銷情形，電話行銷人員亦可事前做足產品行銷準備以期精準行銷，這不但可提高電話行銷的成功率，也能減少電話行銷人力與降低其成本。本研究的資料蒐集來源是UCI公開資料庫的葡萄牙銀行電話行銷資料，取得資料後以資料正規化與標準化先進行資料前置處理，為了解決資料不平衡帶來的問題，便先將資料先分割成訓練與測試兩部份，再把訓練部份以分層k褶交叉驗證（Stratified K-fold Cross-Validation）方式來分割與抽樣資料。實作過程是以裝袋法（Bagging）、提升法（Boosting）、堆疊法（Stacking）三類集成學習分類模型，以及單一分類模型（最近鄰居（K-Nearest Neighbor, KNN）、支援向量機（Support Vector Machine, SVM）、決策樹（Decision Tree, DT）、邏輯斯迴歸（Logistics Regression, LR）、類神經網路（Artificial Neural Network, ANN））來做訓練建模，最後將集成學習分類模型與單一分類模型預測的結果做評估與比較。由實驗結果顯示證明，這三類集成學習分類模型在分類的預測能力與穩定度上確實都相較於單一分類模型表現的更佳；另也驗證了分層k褶交叉驗證分割與抽樣資料的方式，可以給集成學習分類模型帶來更好的訓練與分類預測效果。

關鍵字

機器學習；集成學習；分層k褶交叉驗證

並列摘要

There are a huge amount of basic customer information and various transaction data in the banking industry. The ensemble learning classification model can be applied to predict which customers can be successfully marketed before products are marketed, then the banking industry can understand and grasp customers’ expectations in advance. In the marketing, telemarketers can also prepare proper marketing materials in advance for achieving successful marketing. This not only improves the success rate of telemarketing, but also reduces telemarketing manpower and costs. The source of the data collected in this study is the Portuguese bank telephone marketing data from UCI's public database, and the data are pre-processed with data normalization and standardization before applying classification models. In order to solve the problem caused by data imbalance, the data is first divided into training and testing sets, and then the training set is divided and sampled by Stratified K-fold Cross-Validation (Stratified K-fold Cross-Validation). The implementation process is based on three types of ensemble learning classification models of bagging, boosting, and stacking, as well as a single classification model (k-nearest neighbor (K-Nearest Neighbor, KNN), support vector machine (Support Vector Machine, SVM), decision tree (Decision Tree, DT), logistic regression (Logistics Regression, LR), artificial neural network (Artificial Neural Network, ANN)) for training modeling, and finally ensemble learning classification model is evaluated and compared with the results predicted by a single classification model. The experimental results show that the predictive ability and stability of the three types of ensemble learning classification models are indeed better than those of the single classification model. It also verifies the hierarchical k-fold cross-validation of segmenting and sampling data can result out better training and classification prediction effects to the ensemble learning classification model.