透過您的圖書館登入
IP:18.219.189.247
  • 學位論文

整合約略集合論、支援向量機與決策樹之資料挖礦架構及其個案研究

A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies

指導教授 : 簡禎富
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


對於分類預測問題,除了要求其精確度之外,是否能提供簡明易懂的規則來探求相關的管理意義,通常也是決策者的重要考量之一。人力資源管理的問題即為其中一例。在資料挖礦(data mining)的分類預測問題中,約略集合論 (RST)、支援向量機 (SVM) 與決策樹 (DT) 等技術受到許多學者的青睞。約略集合論與DT可以產生規則,而SVM則無此能力。另一方面,RST的屬性選取能力令人矚目,而SVM與DT則以其預測能力受到重視。本研究結合此三項技術的優點,發展一四階段的資料挖礦整合架構以改善預測的準確度並提升規則產生的品質。第一階段是利用RST進行重要屬性的選取,在不損失分類資訊的情形下,將多餘的屬性予以剔除;第二階段則是利用SVM,以交互驗證 (cross validation) 的方式,減少樣本的雜質;觀察新樣本的分佈,若有類別不均 (class imbalance) 的現象,則需進行第三階段的類別調整程序,利用RST所產生的規則來調整分佈不均的類別。經由前三階段,能夠得到較具代表性的屬性與樣本,同時類別分佈也較為均勻。最後,將這些樣本透過DT構建預測模式並產生相關規則。另一方面,對於人力資源管理的預測所要處理的資料通常具有高維度、較為複雜且不確定性也高的特性,使得傳統的統計預測方法陷入低檢定力的窘境。本研究利用所提出的整合性方法,分別針對兩家位於新竹的高科技公司的直接人員與間接人員的甄選資料進行實證分析。結果顯示本研究所提出的方法能夠有效改善預測的準確度並提升規則產生的品質,同時其績效較傳統的RST、SVM與DT為佳。

並列摘要


Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT are significantly on their predictive power. This research aims to integrate the advantages of SVM, RST and DT approaches to develop a hybrid framework to enhance the quality of class prediction as well as rule generation. In addition to build up a classification model with acceptable accuracy, the capability to explain and explore how the decision made with simple, understandable and useful rules is a critical issue for human resource management. DT and RST can generate such rules, however, SVM can not offer such function. The major concept consists of four main stages. The first stage is to select most important attributes. RST is applied to eliminate the redundant and irrelative attributes without loss of any information about classification. The second stage is to reduce noisy objects, which can be accomplished by cross validation through using SVM. If the new data set would induce data imbalance problem, the rules generated by RST would be used to adjust the class distribution (stage 3). Through the stages described above, a data set with fewer dimensions and higher degree of purity could be screened out with similar class distribution and is used to generate rules by using DT which complete the last stage. In addition, the decisions concern with personnel selection prediction always involve handling data with highly dimensions, uncertainty and complexity, which cause traditional statistical methods suffering from low power of test. For validation, real cases of personnel selection of two high-tech companies containing direct and indirect labors in Hsinchu, Taiwan are studied using the proposed hybrid data mining framework. Implementation results show that the proposed approach is effective and has a better performance than that of traditional SVM, RST and DT.

參考文獻


Ahn, B. S., Cho, S. S. & Kim, C. Y. (2000). The integrated methodology of rough set theory and artificial neural network for business failure prediction, Expert Systems with Applications, 18 (2), 65-74.
Barbagallo, S., Consoli, S., Pappalardo, N., Greco, S. & Zimbone, S. M. (2006). Discovering reservoir operating rules by a Rough Set approach, Water Resources Management, 20(1), 19-36.
Batista, G., Prati, R. C. & Monard, M. C. (2004). A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM Special Interest Group on Knowledge Discovering and Data Mining (SIGKDD) Explorations, 6(1), 20-29.
Beckers, A. M. & Bsat, M. Z. (2002). A DSS classification model for research in Human Resource Information Systems, Information Systems Management, 19(3), 41-50.
Braha, D. & Shmilovici, A. (2002). Data mining for improving a cleaning process in the semiconductor industry, IEEE Transactions on Semiconductor Manufacturing, 15(1), 91-101.

被引用紀錄


陳裕文(2010)。考量綜合指標於財務危機預警模式之研究〔碩士論文,朝陽科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0078-0601201112112872

延伸閱讀