至今銀行業競爭激烈,為了要提高利潤,首先要先建立良好的授信品質、減少信用違約之風險,本研究希望透過分群技術,加以解讀集群特徵,運用及預測在往後未知類別的信用消費者顧資料中。本研究利用測試資料集Iris、Wine和Germen Credit dataset建構分群決策步驟,再對國內某銀行信用卡持有者的消費資料進行評等,以利掌握違約傾向的持有者,進行風險控管。 研究內容主要利用三種常見分群技術K-means、Fuzzy c-means、階層式分群法(Hierarchical Clustering)和TOPSIS多準則決策方法,結合決策支援步驟主成分分析法(Principal Component Analysis)篩選變數,肘部法(Elbow Method)選出最適集群數,以及輪廓分析(Silhouette method)評價分群技術之效度。經運用在該銀行信用卡消費者資料,篩選出分群結果較佳之分群技術。並經卡方分析和平均數檢定,檢驗是否和違約比例有顯著關係,同時也對不同分群技術進行檢定,以判定各集群的結果是否與方法相關。 本研究分群結果顯示K-means和FCM分群在信用資料中成效較佳,經過反覆測試和檢定,FCM則有顯著結果,最適合被應用於信用資料。其對三個測試資料集經分群檢定的準確率分別為81%、68%及67%,優於其他分群技術,利用依變數「最差繳款評等」類別檢驗,分群結果顯著,同時也從分群結果中檢驗出3群低違約風險之集群,集群中0位違約消費者,表示可提供銀行進一步探討集群特徵,對往後消費者行為進行控管,預測消費者違約風險。
The banking industry is currently highly competitive, in order to increase profits, first of all, we must establish a good credit quality and reduce the risk of credit default.This study hopes to interpret cluster characteristics through clustering method, and to apply and predict credit consumer data in unknown categories in the future. This study uses the test dataset Iris, Wine and Germen Credit dataset to establish a clustering decision-making step, and then evaluate the consumer data of a bank credit card holder in domestic to facilitate the holder of the default tendency and conduct risk control. In this study, the main use of three common clustering method K-means, Fuzzy c-means, hierarchical clustering and TOPSIS multi-criteria decision making method, combined with decision support procedure "principal component analysis" select variables ,"elbow method" select the optimal number of clusters, and" Silhouette method "Evaluating the validity of clustering methods. These methods are applied to a bank credit card consumer data to select clustering methods with better clustering results. Through chi-square test and t-test, it is tested whether there is significance with the proportion of default, and different clustering methods are tested to determine whether the results of each cluster are related to the method at same time. The clustering results of this study shows that K-means and FCM clusters have a better performance in credit data. After repeated testing and verification, FCM has statistically significant results and is most suitable for credit data. The three test data sets were tested by clustering, and the accuracy rates were 81%, 68%, and 67%, respectively, which were superior to other clustering method. Using the category of “worst payment rating” according to the variables, the clustering results were significant, and three clusters of low default risk clusters were also detected from the clustering results, and the cluster contained 0 default consumers. Representing this result can provide the bank to further explore the characteristics of the cluster, control the behavior of consumers in the future, and predict the risk of default by consumers.