個人信貸信用風險評分卡模型之探討

本研究主要是利用資料採礦中的支持向量機，來建構個人信貸信用風險評分卡模型。目前較常被使用來建立信用風險評分卡模型的方法為邏輯斯迴歸，雖然資料採礦在使用上很方便而且限制不多，但實務上卻較少被使用來建立信用風險評分卡模型，其主要原因為支持向量機模型所選取變數之經濟意涵常不易被解釋。為了探究支持向量機模型是否能提供另一個信用風險評分卡模型的較佳選擇，本研究除了先以該銀行所提供的所有變數為考量的情況下進行模式建構，另外再分別以證據權數(weight of evidence, WOE)/訊息值(information value,IV)、逐步選取法、刪除異常變數、相關係數等四種方法來選取變數，並將這五種篩選出的變數組合分別套用在邏輯斯迴歸及支持向量機模型中；另外在支持向量機模型中，本研究所採用分割資料的核函數(kernel functions)分別有線性(linear)、多項式(polynomial)、放射(radial basis function, RBF) 和S型(sigmoid)等四種，期望能從以上所搭配出的這二十五種模型中，找到較適合且能合理解釋的信用風險評分卡模型。至於本研究採用評估各模型優劣的準則有正確率(accuracy rate)、AUROC(area under the receiver operating characteristic)、吉尼(gini)係數、穩定度分析指標(population stability index, PSI)及交叉驗證(cross-validation)。本研究實證結果顯示，支持向量機模型中採用放射核函數的方法為最佳，其正確率為最高，而AUROC、吉尼係數雖然並非為最高，但其值跟最高的邏輯斯迴歸相差並不大，因此本研究建議先以此法為分類之優先選擇。

關鍵字

信用風險評分卡模型；邏輯斯迴歸；支持向量機；核函數

並列摘要

The main purpose of the research is to build a credit scoring model for personal loans with a data mining approach based on support vector machines (SVM). Though the logistic regression model is more commonly adopted by the credit card industry due to its easier explanation feature in credit scoring, SVM are more accurate in applicants’ classification problems pointed out in recent literature. Hence this research intends to apply SVM incorporating the features selected from 4 different criteria and suggests a better model for the credit scoring problems. The feature selection criteria includes the original variables provided by the credit card department in Taiwan financial holding company, the stepwise procedure through the logistic regression model, weight of evidence/ information value, abnormal deletion and correlation coefficients. In addition, 4 different kernel functions- linear, polynomial, radial basis function and sigmoid, are adopted in SVM to find the optimal hyperplane. To evaluate the performance of SVM, we compare them with naïve logistic regression along with the aforementioned 5 different feature combinations. Besides, population stability index and cross-validation are used to check the model fitness of the aforementioned 5 naïve logistic regression models and 20 SVM, respectively. The empirical results show that SVM with radial basis function performs more or less about the same as the naïve logistic regression models in term of area under the receiver operating characteristic, equivalently, and gini coefficient. However, it outperforms the rest 24 models in terms of accuracy rate. Therefore, SVM with radial basis function is recommended.

並列關鍵字

credit scoring model ； logistic regression ； support vector machines ； kernel function

參考文獻

[6] Altman, E. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589–609.

[7] Bailey, M. (2001). Credit Scoring：The Principles and Practicalities, Bristol：White Box Publishing.

[8] Beaver, W. (1966). Financial ratios as prediction of failure. Empirical research in accounting: selected studies. Journal of Accounting Research, 4, 71–111.

[11] Chen, P. H., Lin, C. J., & Scholkopf, B. (2005). A tutorial on v-support vector machines. Applied Stochastic Models in Business and Industry, 21, 111–136.

[12] Ding, Y. Y., & Wilkins, D. (2006). Improving the Performance of SVM-RFE to Select Genes in Microarray Data. BMC Bioinformatics, 7(S-2).

被引用紀錄

許書綸（2013）。結合多屬性決策與資料探勘技術建構企業績效評估與預測之模式〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-0108201317101900

國際替代計量

個人信貸信用風險評分卡模型之探討

全文下載

主題瀏覽