透過您的圖書館登入
IP:3.144.109.5
  • 學位論文

利用關聯演算法重現決策樹分類結果

Using Association Classification to Represent a Decision Tree

指導教授 : 蔣璿東

摘要


分類(Classification)是資料探勘(Data Mining)常用策略之一,而關聯演算法(Association Classification)與決策樹(Decision tree)更是分類上經常使用的方法。雖然關聯演算法的主要優點在提供全域決策規則(Global Decision Rules),但關聯演算法無法直接處理連續型數值資料,先行對連續型數值進行離散化處理,而要找最佳數值切點是一個NP-hard問題;且關聯演算法雖可以找出所有決策規則,但由於規則數量過多,較難建構一個完整知識解釋結構。另一方面,雖然決策樹能夠直接處理連續型數值與非連續型數值欄位及易於產生明確的知識結構,但決策樹因樹狀結構與演算法的限制,因此在由決策樹轉換後的決策規則是屬於區域決策規則(Local Decision Rules)且決策規則當中可能存在不相關決策條件(Irrelevant Classification Condition)。因此,本論文將針對此問題,提出解決方法。本研究首先利用決策樹在連續數值屬性以及快速處理的特性,找出隱藏在資料集內的知識與決策規則,同時藉由決策樹演算法的區域性(local)特性快速度出所有可能連續數值屬性的離散化切點集。而後再將決策樹轉換後決策規則與離散化切點集重新利用關聯演算法整理,將決策樹決策規則重新轉換成為以全域性、移除不相關決策條件及條件更簡單的關聯決策規則。最後,在本研究中利用卵巢子宮內膜異位症臨床資料集進行實驗,實驗结果表明,對比CART決策樹生成的決策規則在原始決策規則之下,提出分類精度較高、條件更簡單且可理解性強的關聯決策規則。

並列摘要


Since the derived rules of decision trees are local, the association classifier has higher accuracy than decision tree classifier and many useful rules are left undiscovered by the decision tree techniques.However, goal of the classification rule mining is to discover a small set of rules in the database, the association rule technique will capture all possible rules in the database and generate too many rules; one the other hand, many useful rules are left undiscovered by the decision tree techniques. Medical data always contains numeric (continuous values) attributes; however, the association rule technique can not deal with numeric data directly and it is not an easy task to find out the appropriate way to discrete numeric attributes. Moreover, in order to neutralize drawbacks of these two mining techniques and use current commercial mining tools to analyze postoperative status of ovarian endometriosis patients to discover rules, we propose a concept to take the advantages of decision tree and association rule techniques to mine the data. In this paper, our goal is to investigate the efficacy of transvaginal aspiration and sclerotherapy with 95% ethanol retained in situ for the treatment of recurrent ovarian endometriomas. Moreover, although several researchers have performed statistical method to prove that aspiration followed by injection 95% ethanol left in situ (retention) is an effective treatment of ovarian endometriomas, very few of them discuss about the conditions that could generate better recovery rate for the patients. Therefore, this study adopts the statistical method and data mining techniques together to analyze postoperative status of ovarian endometriosis patients to discover such conditions.

參考文獻


[31] D. A. Chiang, C. T. Wang, Y. H. Wang et al., “The Irrelevant Values Problem of Decision Tree for Improving a Glass Sputtering Process,” Journal of Applied Science and Engineering, vol. 13, no. 4, pp. 413-422, 2010.
[32] K.-Y. Chou, H.-C. Keh, N.-C. Huang et al., “The Irrelevant Values Problem of Decision Tree in Medical Examination,” JOURNAL OF APPLIED SCIENCE AND ENGINEERING, vol. 15, no. 1, pp. 89?96, 2012.
[34] C.-Y. Lin, “Remove The Irrelevant Values Problem in the Decision Tree by Using Association Rule,” Department of Computer Science and Information Engineering, Tamkang University, 2009.
[42] P. S. M. Tsai, and C. M. Chen, “Mining quantitative association rules in a large database of sales transactions,” 2000.
[45] N.-C. Hsieh, “Finding Relevant Fuzzy Association Rules from Medical Databases,” Jouranl of Information Management, vol. 12, no. 2, pp. 25-51, 2005.

被引用紀錄


劉冠廷(2014)。應用超學習增進傳統掛袋法準確度之研究〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://doi.org/10.6827/NFU.2014.00042
鍾政旭(2017)。應用集成式學習於不孕症治療成功率預測之研究〔碩士論文,國立虎尾科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0028-2806201711132300

延伸閱讀