使用降低規則相依問題影響來改善關聯式分類效能

關聯式分類法(Associative Classification, AC)在規則排序(Ranking)時會因為分類的方式不同而有所差異，但基本上，都是依照信賴度由高至低排序，所以在進行分類時，也都是先利用信賴度高的規則做分類。由於某些關聯式規則之間會有規則相依問題(rule dependency problem)，對這些規則而言，執行的先後順序會影響到這些規則對尚未被分類資料的信賴度；由於rule dependency problem會造成規則信賴度的改變，所以對尚未被分類資料而言，目前關聯式分類法都實際上並未完全信賴度的由高低來進行分類，進而影響最後分類的結果。現有的關聯式分類法在進行分類時，都沒有考慮 Rule dependency problem。主要是因為訓練文件有可能會產生不同類別的規則，在分類時，也可能會因為不同的規則被分類至不同的類別，因此哪一條規則先執行的確會產生不同的分類結果。但對有 n 條規則的 AC 而言，規則有 n! 種執行順序，所以要解決rule dependency problem (找尋最佳規則執行順序) 將是一個非常耗時的工作。所以本研究主要探討關聯式分類器中Rule Dependency Problem的問題，並提出不同的多項式時間(polynomial time)排序演算法來設定分類器中規則的執行順序，以降低規則相依問題對分類結果的影響，進而提昇關聯式分類器的分類準確度。

關鍵字

關聯式分類；文件分類；規則相依

並列摘要

Since the dependence of rules may affect the confidences of rules, the execution order of the remaining rules is not ranked by the actual confidence of rules to the unclassified data, which will directly influence the classification accuracy of the associative classifier. However, finding the optimal execution order of CARs is a combinational problem, it is a very time consuming process. In this project, instead of finding the optimal execution order of CARs, we plan to propose different algorithms to re-rank the execution order of CARs to reduce the influence of the rule dependency problem and improve the classification accuracy of the associative classifier. For resolving the rule dependency problem, the number of executing ranking for N rules should be N!. As a result, finding out the optimal rule-executing ranking is a time consuming task. Therefore, instead of finding the optimal execution order of CARs, in this paper, we propose a polynomial time algorithm to re-rank the execution order of CARs by rules’ priority to reduce the influence of the rule dependency problem. Consequently, the performance (the classification accuracy and recall rate) of the associative classification algorithm can be improved.

並列關鍵字

Association Classification ； Text Classification ； Rule Dependency

參考文獻

[1] F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev., vol. 22, 2007, pp. 37-65.

[2] J.-Y. Jiang, R.-J. Liou, and S.-J. Lee, "A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification,"accepted by IEEE Transactions on Knowledge and Data Engineering, Nov. 2009.

[4] P. Soucy and G. Mineau, “A simple KNN algorithm for text categorization,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 647-648.

[5] W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient classification based on multiple class-association rules,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 376, 369.

[6] P.G. Elena Baralis, “A Lazy Approach to Pruning Classification Rules,” Dec. 2002.

國際替代計量

使用降低規則相依問題影響來改善關聯式分類效能

全文下載

主題瀏覽