以SVM為基礎的文件階層式多元分類

本研究利用企業文件類別的階層架構，建立由多個多元分類器所組成的階層分類模型，以便讓文件依照類別階層由上往下逐步的分類。我們使用的多元分類方法是以SVM分類器搭配one-against-one 分類方法。針對這些分類器，我們採用DF(Document Frequency)搭配CC(Correlated Coefficient)兩種門檻值來篩選特徵詞。本研究以一組企業技術文件和一組大陸新聞資料兩組性質不同文件資料集進行測試，實驗結果顯示，本階層式分類器在兩組文件資料集中都有良好的分類表現，並且比非階層式的分類方法更能節省分類時間。

關鍵字

階層分類；多元分類； SVM ；特徵詞挑選

並列摘要

This study presents a hierarchical multi-class text classification framework based on the characteristics of enterprise documents. The multi-class classifiers are based on Support Vector Machines using an one-against-one approach. The features used by each classifier are selected using DF (Document Frequency) and CC (Correlated Coefficient). We conducted experiments on two different datasets; one contains enterprise documents from IC a local equipment manufacture and the other contains mainland china news. The experimental results show that our proposed method performed well on both datasets and ran faster than a non-hierarchical approach.

並列關鍵字

Multi-class Classification ； Hierarchical Classification ； Support Vector Machines ； Feature Selection ； Text Categorization

參考文獻

[1] 平震宇，『一個適用於行動裝置的網頁搜尋結果分群系統之研究』，元智大學，資訊管理研究所碩士論文，2007。

[3] Bottou, L. et al. “Comparison of classifier methods: a case study in handwriting digit recognition,” In Proceedings of ICPR-94, IEEE Computer Society Press, Los Alamitos, CA, 1994, pp. 77–87.

[4] Chang, C.-C. and C.-J. Lin (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[6] Ding, C. and Dubchak, I. “Multi-class protein fold recognition using support vector machines and neural networks, ” Bioinformatics(17) 2001, pp:349–358.

[7] Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I. and Kim, S.H. “Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification, ” Proteins (35) 1999, pp:401-407.

被引用紀錄

許巧靜（2011）。類別相關詞對搜尋引擎的搜尋結果排名之影響〔碩士論文，元智大學〕。華藝線上圖書館。https://doi.org/10.6838/YZU.2011.00190

國際替代計量

以SVM為基礎的文件階層式多元分類

主題瀏覽