透過您的圖書館登入
IP:13.58.244.216
  • 學位論文

使用多目標演化式演算法進行資料探勘中的法則萃取

Multiobjective Evolutionary Algorithm for Rule Extraction in Data Mining

指導教授 : 傅立成
共同指導教授 : 蔣宗哲(Tsung-Che Chiang)

摘要


這篇論文研究如何解決資料探勘中法則萃取的問題,其中包括了數值關聯法則探勘 (numeric association rule mining)以及分類法則探勘 (classification rule mnining)。這兩類的問題存在著多個目標需要同時被最佳化,而這些目標時常互相抵觸。我們提出了兩個多目標演化式演算法來分別解決這兩種問題。我們採納了MOEA/D中透過布置均勻的權重向量來達成配對選擇 (mating selection)和物競天擇 (environmental selection)的概念來維持探索及開發間的平衡。為了保留那些具有相同適應度 (fitness)但卻不相同的解,MOEA/D中對於子問題的解從限制一個解被修改成可以是一個解集合。我們遵循一般關聯法則探勘的架構,透過尋找頻繁項目集 (frequent itemset)來進行數值關聯法則探勘。而對於分類法則探勘,我們提出了一個結合密西根 (Michigan)和匹茲堡 (Pittsburgh)兩種方法的兩階段演化式演算法。透過第一階段先找出所有柏拉圖最佳 (Pareto-optimal)法則,在第二階段則利用這些法則組合成柏拉圖最佳法則集合。當法則互相起衝突時,每個法則集合會根據各自的喜好選擇對應方針。我們提出的數值關聯法則探勘演算法透過實驗在人造的資料集中可顯示出他的正確性和有效性。我們也把這個方法用在一些公開的實際生活上產生的資料集上,可當成外來比較的依據。對於分類法則探勘,我們在一些公開的資料集上和一些現存基於法則 (rule-based)或是非基於法則 (non-rule based)的分類器進行比較,實驗結果顯示我們的方法是有效的。

並列摘要


In this thesis, the problem of rule extraction in data mining including numeric association rule mining and classification rule mining is addressed. Both tasks involve many objectives to be optimized simultaneously, where the objectives frequently contradict with each other. Two Pareto-based multiobjective evolutionary algorithms are proposed to solve these problems. By incorporating the concept of MOEA/D, the mating restriction and environmental selection enhance the exploitation and exportation ability through setting the uniform weight vectors. And the solution of subproblem defined in MOEA/D is modified to a set of solutions to obtain solutions with same fitness. For numerical association rule mining, the proposed algorithm follows the common framework to obtain frequent itemsets. For classification, a two-phase multiobjective evolutionary algorithm is proposed which combines both Michigan and Pittsburgh approach to find Pareto-optimal rules first and then to form the Pareto-optimal rule set. The policy for each rule set is different according to its preference when conflict between rules occurred. Through experiments upon synthetic datasets, the proposed algorithm for numeric association rule mining shows its correctness and efficiency. The proposed algorithm is also applied upon several public real life datasets for future comparison. And for classification, the experimental results show it’s competitive against existing rule-based and non-rule based classifiers upon several public datasets.

參考文獻


[4] K. Ke, J. Cheng, and W. Ng, “MIC framework: An information-theoretic approach to quantitative association rule mining,” in Proceedings of the 22th International Conference on Data Engineering, Atlanta, GA, USA, 2006, pp. 112–114.
[15] J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
[16] W. W. Cohen, “Fast effective rule induction,” in Proceedings of the 12th International Conference on Machine Learning, Tahoe City, California, USA, 1995, pp. 115-123.
[19] L. I. Kuncheva, Fuzzy Classifier Design. Physica-Verlag, 2000.
[21] M. Delgado, N. Marin, D. Sanchez, and MA.Vila, “Fuzzy association rules: General model and applications,” IEEE Transactions on Fuzzy Systems, vol. 11, no. 2, pp. 214–225, 2003.

延伸閱讀