應用多層次架構之類別優先度與多重分類器改善文件分類準確率

一般關聯式分類法（Associative Classification, AC）通常依照準則排序，然而規則與規則間存在著規則相依性（Rule Dependency）的問題，在相同的信賴值、支援值、長度的條件下，規則的執行順序仍然會對分類結果造成影響。本論文核心針對規則排序問題，除了採用Lazy法則為一般排序原則針對100%信賴值階層進行文件分類外，並刪除分類過文件重新計算信賴值排序，加上採用多層次類別優先度的概念，來探討其對分類效能的影響。利用TFIDF權重及貝氏分類器初次分類後所得之最低類別準確率設為單一靜態門檻值，AC無法分類之文件則以貝氏分類器來分類，以解決關聯式分類器預設類別降低分類準確率的問題。

關鍵字

關聯式分類法；規則排序；規則相依性；多層次類別優先

並列摘要

Regardless that the associative classification (AC) [1][2] method normally ranks the sequence according to the prescribed criteria, yet in terms of the problem of rule dependency that exists between rules, under the identical confidence value, support value and length criteria, the sequence by which the rules are executed can still impact the classification results. The core of the thesis, focusing on rule ranking problems, entails for more than adopting the Lazy[3] method as the general ranking principle for conducting document classification focusing on 100% confidence level, but also by pruning the classified documents to recalculate the confidence value ranking, together with using a multilevel class priority concept, to examine how it affects the classification performance. The TFIDF[4] weighing and the minimum classification criteria derived from the preliminary classification using the Naïve Bayes[5] classifier are used to define a single still-mode threshold value, and the Naïve Bayes classifier used to classify documents unclassifiable by the associative classification method, aiming to resolve the problem of lowering the classification precision rate due to the preset categories when using the associative classifiers.

並列關鍵字

Associative Classification ； Ranking ； Rule Dependency ； Multi-level Class Priority

參考文獻

[1] F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev., vol. 22, 2007, pp. 37-65.

[6] G. Salton and C. Buckley, Term Weighting Approaches in Automatic Text Retrieval, Cornell University, 1987.

[11] P.G. Elena Baralis, “A Lazy Approach to Pruning Classification Rules,” Dec. 2002.

[12] W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient classification based on multiple class-association rules,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 376, 369.

[15] Jing Chen, Zhigang Zhang, Qing Li and Xiaoming Li, 2005, “A Pattern-Based Voting Approach for Concept Discovery on the Web,” Web Technologies Research and Development-APWeb 2005, Volume 3399/2005

國際替代計量

應用多層次架構之類別優先度與多重分類器改善文件分類準確率

全文下載

主題瀏覽