一個以卡方為基礎的文件多重分類方法

本研究提出一個以inverse chi-square分類器為基礎的方法，這個方法包含一個為各類別挑選特徵詞的流程，以及建立了一個詞彙-類別相關的權重矩陣，為測試文件找尋對應於各類別的特徵權重。再以inverse chi-square分類器計算出文件在各類別的指標值，作為分類之依據。本研究採用DF (Document Frequency)、CC (Correlated Coefficient)與ICF (Inverted Conformity Frequency) 三種門檻值分別為不同類別篩選出不同的特徵詞。最後以 Reuters 21578 資料集中文件篇數前10大類別的實驗結果顯示，本方法的Precision、 Recall 和 F1-measure 分別可達 87%, 98% 和92%左右，和多重分類研究中著名的Boostexter的效能表現相當。

關鍵字

文件多重分類；相關係數；權重矩陣；倒卡方分類器；集中度

並列摘要

This study presents a based method to multi-label text categorization term-category weighted matrix. This method uses an inverse chi-square classifier to calculate an indicator value with respect to each category under consideration based the testing document’s feature weights represented by correlation coefficient. We use three thresholds including DF (Document Frequency), CC (Correlated Coefficient) and ICF (Inverted Conformity Frequency), to extract different category’s relevant terms. Finally, we conduct experiments on the top 10 categories of Reuters 21578. The experimental results show that the Precision, Recall, F1-measure can reach 87%, 98%, 92%, respectively. Our method is shown to be comparable to the famous multi-label method, Boostexter.

並列關鍵字

Multi-label Text Categorization ； Correlated Coefficient ； Weighted Matrix ； Inverse Chi-square Classifier ； Conformity

參考文獻

9. Lee, L. H. and Luh, C. J. “Classifying Pornographic Web Pages Using a Chi-Square Based Statistics Method,” Journal of Information Management (14:2), 2007, pp: 225 -246.

2. Chang, Y. C., Chen, S. M. and Liau, C. J. “Multilabel Text Categorization Based on a new Linear Classifier Learning Method and a Category-Sensitive Refinement Method,” Expert Systems with Applications(34:3) 2008, pp:1948-1953.

3. Chen, Y. L., Hsu, C. L. and Chou, S. C. “Constructing a Multi-valued and Multi-labeled Decision Tree,” Expert Systems with Applications (25:2) 2003, pp:199-209.

4. Chou, S. and Hsu, C. L. “MMDT: a Multi-valued and Multi-labeled Decision Tree Classifier for Data Mining,” Expert Systems with Applications (28:4) 2005, pp: 799-812.

5. Dalton, J. and Deshmane, A. “Artificial neural networks,” IEEE Potentials, Vol. (10:2)1991, pp: 33-36.

被引用紀錄

許巧靜（2011）。類別相關詞對搜尋引擎的搜尋結果排名之影響〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2011.00190

國際替代計量

一個以卡方為基礎的文件多重分類方法

主題瀏覽