透過您的圖書館登入
IP:3.129.45.92
  • 學位論文

Building Associative Classifier with Multiple Minimum Supports

Building Associative Classifier with Multiple Minimum Supports

指導教授 : 胡雅涵
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


關聯分類(CBA)在資料探勘研究領域當中是一項很重要的技術。藉由關聯分類產生出的分類規則形式為”items→y”,其中y是一個分類標籤,而每一個項目(item)的格式為一對(屬性名稱, 屬性值)。在關聯分類中,使用者自訂的最小門檻值(minsup)目的是在於找出一組完整的分類規則。 在過去相關的關聯分類研究顯示類別不平衡問題(class imbalance problem)可能會影響分類的成效,而目前有多種方法已發展來解決這樣的問題。然而,在分類規則中,相同的問題也發生在項目上。也就是說,在資料庫中的每個項目發生的變化可能是巨大的。就單一門檻值的結果而言,如果minsup設的太高,將會損失有趣的規則,反之,如果minsup設的太低,則會產生大量包含無意義的規則。因此,本篇研究考慮多重門檻值(MMSs)的概念進入到關聯分類,其意義就是說不同的項目或是類別標籤能夠擁有他們自己的minsup。 我們提出了一個稱為MultipleMMS-AC的新方法,它是針對關聯分類結合多重門檻值的觀念。接著在規則選取上,我們採取最大概似估計、Laplace、Scoring以及Max X2等四種方法用來建立分類器。最後,我們在一些真實的資料庫中與C4.5、SVM、PART、RIPPER、ANN和CBA的比較結果顯示,我們所提出的MultipleMMS-AC方法比過去這些技術的效果還要來得好。

關鍵字

無資料

並列摘要


Classification Based on Associations (CBA) is an important method in the field of data mining. A classification rule generated by the CBA is in the form of “items→y”, where y is a class label and each item is presented in the format of (attribute-name,attribute-value) pair. Given a user specified minsup, CBA aims at discovering complete set of classification rules. Previous studies reveal that the class imbalance problem may affect the classification performance in CBA and various approaches have been developed to deal with the problem. However, the same problem may also occur for the items in the classification rules. That is, the variation of the occurrence of each item may be huge in the dataset. Considering single minsup results in the loss of interesting rules (if minsup is set as too high) or huge amount of rules (if minsup is set as too low). Therefore, this study considers the concept of multiple minimum supports (MMSs) into CBA, which means that different items or class labels can have their own minsup. A new method, called MultipleMMS-AC, is proposed for the CBA with MMSs. Then, four rule selection methods, including Maximum likelihood, Laplace, Scoring and Max χ2, are considered in the construction of classifiers. Compared with C4.5, SVM, PART, RIPPER, ANN and CBA, the results indicate that MultipleMMS-AC perform better than these approaches in some real datasets.

參考文獻


[17] J. Soni, U. Ansari, D. Sharma, and S. Soni, “Predictive Data Mining for Medical
[38] Y. C. Lee, T. P. Hong, and W. Y. Lin, “Mining association rules with multiple
multi-time-interval sequential patterns,” Data & Knowledge Engineering, vol. 68,
[2] Y. H. Hu and Y. L. Chen, “Mining association rules with multiple minimum
[3] R. Agrawal, T. Imielinski, A. Swami, “Mining Association Rules between Sets of