特徵選取為基礎之複合分類預測模式-以信用資料為例

分類是根據事物之特性將事物指派到某一類別的過程，為資料探勘領域最常被探討的問題。處理大量資料的分類問題有許多方法論，各適用在不同情況與資料性質。大多數分類方法為確保分類品質建議先進行資料特徵選取，避免多餘或不相關特徵影響分類正確率。透過多樣化的特徵選取方式縮減特徵數改善正確率進而發展出許多不同的分類模式，其中複合式的資料探勘分類方法常被用以建立有效分類模式。本研究以資料探勘分類方法論建構複合分類預測模式，運用線性區別分析法 (Linear Discriminate Analysis, LDA)、約略集理論法(Rough Sets Theory approach, RST)、決策樹法 (Decision Tree, DT)、 F分數法(F-Score )與灰關聯分析法(Grey Relational Analysis, GRA)五種特徵選取技術篩選影響分類的重要屬性，之後結合類神經網路(Neural Network, NN)、K最鄰近法(K-Nearest Neighborhood, KNN)、支援向量機(Support Vector Machine, SVM)與邏輯斯迴歸(Logistic Regression, LR)四種分類方法形成不同複合分類模式，計算所得複合模式之測試樣本平均分類正確率，比較預測分類結果與實際結果，並同時比較不同特徵選取方法應用在相同分類機制上、相同特徵選取方法應用在不同分類機制上的差異；此外分類結果採用無母數符號檢定(Wilcoxon Sign Rank Test)探討模式間是否有顯著差異。所得結果顯示，經過特徵篩選後所得之複合模式比原始模式分類能力顯著提升，各複合模式所保留之重要特徵相對重要性排序呈現類似結果；以F分數法作為特徵選取結合不同分類法所形成複合模式具最佳平均分類正確率。

關鍵字

特徵選取；邏輯斯迴歸；類神經網路； K最鄰近法；支援向量機

並列摘要

Classification is a process of assigning objects into different classes by their attributes which has been discussed mostly in the field of data mining. There are many classification methodologies in dealing with huge data that apply to various situations and different characteristics of data. Most classification methodologies suggest features selection first to ensure the quality of classification so that the accuracy of classification will not be affected due to redundant or irrelevant features. Diversity classification models will be developed through the reduction of features that improve the accuracy of original classification models. The compound data classification methods are usually employed to establish effective classification models. This research establishes prediction models of classification by data mining methodology. Important attributes are extracted by five various features selection approaches that combine with the four different classifiers to optimize features space. The average accuracy of each approach is compared in combination with different classifiers and nonparametric Wilcoxon signed rank test is taken to show if there is any significant difference between these models. The experimental results demonstrate that the proposed structures outperform original methods and the features selection approach of F-score is a promising method for the fields of data mining.

並列關鍵字

Features Selection ； Logistic Regression ； Neural Network ； K Nearest Neighborhood ； Support Vector Machine

參考文獻

張淑珍(2006)。利用一次性的SQL改良決策樹建立信用卡審核之信用評等。東吳大學商學院資訊科學系碩士論文。

莊瑞珠(2007)。邏輯斯迴歸模型運用在女性信用卡評分制度之研究。輔仁管理評論，中華民國96 年1 月，第十四卷第一期，127-154。

李豪剛(2007)。運用資料探勘技術於臺灣鋼筋混凝土橋梁構件劣化因子之研究。國立中央大學營建管理研究所碩士論文。

林芝儀(2002)。應用資料探勘於信用卡授信決策模式之實證研究。元智大學　工業工程與管理研究所碩士論文。

韓歆儀(2004)。應用兩階段分類法提昇SVM 法之分類準確率。國立成功大學工業與資訊管理研究所碩士論文。

被引用紀錄

吳思葦（2015）。運用資料探勘於銀行業潛在顧客預測模型之研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2015.00034

巫天虹（2013）。以兩階段分類法建構信用卡授信決策模型的實務評估〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2013.01030

黃世禎（2013）。銀行信用風險評分應用資料探勘技術之比較研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2013.00868

林裕森（2011）。運用不同階段檢驗項目建構急性腎衰竭病患之預後模型〔碩士論文，朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-1511201110382713

劉集中（2011）。資訊委外服務供應商之客服專家系統〔碩士論文，朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-2611201410143474

國際替代計量

特徵選取為基礎之複合分類預測模式-以信用資料為例

全文下載

主題瀏覽