透過您的圖書館登入
IP:160.79.108.234
  • 學位論文

行職業描述自動分類之研究

Automatic Classification of Profession Descriptions

指導教授 : 葉慶隆
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


分類意指依據文件內容給予類別的動作。傳統上,必須由文件專家以人工方式進行分類。這樣的方式需花費較多的資源。「文件自動分類」是根據文件內容或主題自動給定類別的工作。 在文件分類上有兩個主要的步驟。一、特徵選取,二、相關函數選擇。 我們提出了兩個技術去改善分類的正確性,XML詞庫階層式比對及以XML結構儲存關鍵字。 實驗結果顯示,XML詞庫階層式比對較資料庫詞庫線性式比對,能達到較高的正確率。

並列摘要


Classification means the determination of subject content. In tradition, the work of documentation classification is assigned by manual (document specialist). However, it costs a lot of resource. The automatic document classification is a process of appointing the suitable category to testing document. In document classification, there are two main steps. The first step is features selection, and the second is relevance function selection. Here we propose two techniques to improve the precision of classification by using XML lexicon hierarchical comparison and store keywords in XML schema. The experimental result shows that the proposed method achieves higher accuracy than database lexicon linear comparison.

參考文獻


[3] Harold Borko and Myrna Bernick, ” Automatic Document Classification”, Journal of the ACM, Vol. 10, No. 1, 1963.
[4] Honglan Jin and Kam-Fai Wong, ” A Chinese Dictionary Construction Algorithm for Information Retrieval”, Journal of the ACM, Vol. 1, No. 4, 2002.
[5] Thorsten Joachims & Fabrizio Sebastiani, ” Guest Editors’ Introduction to the Special Issue on Automated Text Categorization”, Journal of Intelligent Information Systems, 18, 1–3 2002.
[7] kuang hua Chen,” Automatic Identification of Subjects for Textual Documents in Digital Libraries”, National Taiwan University.
[8] Jianfeng Li,” Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation”, university of Science and Technology of China.

延伸閱讀