近年來,由於電腦資訊技術之蓬勃發展,各企業組織與機構之電子化文件乃以幾何級數之速度快速成長,因此如何利用自動化文件分類技術協助企業組織與機構管理電子化文件,進而提高企業知識管理之效能,實為現今知識管理與相關研究之重要課題。由於電子化文件之內容充滿複雜性與多樣性,故以人工決策之方式判斷文件類別不僅不符合經濟效益且其處理速度亦十分緩慢;此外,文件類別認定標準亦難維持一致性。有鑑於此,本研究提出一套以主成分分析法則為基之文件類別自動判定方法,其乃先擷取已知類別文件之關鍵字及其頻率值,並藉由整併此些關鍵字而取得所有類別文件下各文件關鍵字之聯集;再以此聯集之詞彙頻率進行分類依據關鍵字推論,進而尋找具分類代表性之關鍵字。之後,以具分類代表性關鍵字為基礎,擷取各個已知類別文件群和目標文件之關鍵字頻率值,並計算各文件類別與目標文件之隸屬關係值,以藉由隸屬關係值判定目標文件所屬類別。本研究最終乃建立一套知識文件自動分類系統,並以一案例評估此模式與技術之有效性與可行性。 綜合言之,本研究之目標乃為提昇文件自動分類技術之正確率與效率性,以協助企業組織與機構有效提高其知識文件管理之效能,並進而提昇企業之知識利用率。此外,對於資訊需求者而言,本研究則能協助資訊需求者於龐大之網路資訊/文件中,迅速且便捷地尋得其所需要之文件資料,以節省資訊需求者花費於資訊過濾與篩選之大量時間。
Owing to the booming growth of information technology, the number of digital documents has significantly increased over the Internet and within organizations. In order to enhance the performance for enterprises to manage their digital documents and domain knowledge, automatic document classification has become a key issue for enterprise knowledge management. Concerning complexity of different types of digital documents, this paper utilizes the principal component analysis (PCA) to develop an algorithm for automatic document classification. Based on PCA, representative keywords of distinct document categories can be obtained. Furthermore, according to the frequencies of representative keywords in the target document, the category of the target document can be determined. In addition to the document classification algorithm, a Web-based document classification system is also developed and a demonstration case is applied to verify the performance of the proposed approach. The attempt of this research is to enhance the accuracy and efficiency of enterprise document classification technology and to enable a self-service knowledge management mechanism in organizations.