使用基因演算法來建構分類樹是一項創新的實驗研究。我們使用基因演算法不僅能用於輔助決策樹來決定最佳的屬性,並結合了資訊理論,以其作為分類樹中的分割準則,即最小化熵之概念。我們採用的方式是使用基因演算法,一開始以啟發式的作法來產生初始之演化族群,考量最小化之熵數與即時線上測試誤差為衡量依據,來找出分類樹中之最佳切割點,同時決定其切割屬性與切割值,一步步地以最佳節點將分類樹給推衍出來。在本研究中,我們針對五種具代表性的分類資料進行實驗,將提出的作法與其他決策樹演算法做比較。實驗結果顯示,本研究所提出之混合啟發式基因分類樹演算法能夠有效的降低分類樹中切割方式的複雜度,並能建構出較小的分類樹。此外,本研究亦開發了一套雛形系統,用以輔助分類樹的建構並自動產生一些規則以支援決策者制定決策。我們嘗試採用基因演算法去設計一套知識擷取系統,用以建構分類樹以進一步進行資料探勘之應用。
In this paper, using genetic algorithms to construct classification trees is a novel implement research. The Genetic Algorithm not only assists to get the optimal features as the tree nodes but also combine the information theory as the criterion of the classification trees to minimum the entropy. The splitting method of the subsets of individuals associated to the nodes is the Genetic Algorithm. The stopping criterion of the tree induction is based on a heuristic able to recognize whether the set of the individuals associated to a node of the tree is a sub-population, or not. And considering the information theory combined with Genetic Algorithm-based computing to induct the classification tree. Experimental results for five data mining classification problems are presented and compared with other decision trees algorithms. This Hybrid GA-Heuristic Classification Tree algorithm indicates that a Genetic algorithm reduces the complexity of the used splitting methods to construct a small tree. A prototype was presented to assistant the classification tree construction and produced a set of rules to support decision makers. We try to design a knowledge acquisition system using genetic algorithms to construct the classification trees applying to data mining.