以主成分分析法為基礎之文件自動分類模式

近年來，由於電腦資訊技術之蓬勃發展，各企業組織與機構之電子化文件乃以幾何級數之速度快速成長，因此如何利用自動化文件分類技術協助企業組織與機構管理電子化文件，進而提高企業知識管理之效能，實為現今知識管理與相關研究之重要課題。由於電子化文件之內容充滿複雜性與多樣性，故以人工決策之方式判斷文件類別不僅不符合經濟效益且其處理速度亦十分緩慢；此外，文件類別認定標準亦難維持一致性。有鑑於此，本研究提出一套以主成分分析法則為基之文件類別自動判定方法，其乃先擷取已知類別文件之關鍵字及其頻率值，並藉由整併此些關鍵字而取得所有類別文件下各文件關鍵字之聯集；再以此聯集之詞彙頻率進行分類依據關鍵字推論，進而尋找具分類代表性之關鍵字。之後，以具分類代表性關鍵字為基礎，擷取各個已知類別文件群和目標文件之關鍵字頻率值，並計算各文件類別與目標文件之隸屬關係值，以藉由隸屬關係值判定目標文件所屬類別。本研究最終乃建立一套知識文件自動分類系統，並以一案例評估此模式與技術之有效性與可行性。綜合言之，本研究之目標乃為提昇文件自動分類技術之正確率與效率性，以協助企業組織與機構有效提高其知識文件管理之效能，並進而提昇企業之知識利用率。此外，對於資訊需求者而言，本研究則能協助資訊需求者於龐大之網路資訊/文件中，迅速且便捷地尋得其所需要之文件資料，以節省資訊需求者花費於資訊過濾與篩選之大量時間。

關鍵字

主成分分析；文件分類；關鍵字擷取；知識管理

並列摘要

Owing to the booming growth of information technology, the number of digital documents has significantly increased over the Internet and within organizations. In order to enhance the performance for enterprises to manage their digital documents and domain knowledge, automatic document classification has become a key issue for enterprise knowledge management. Concerning complexity of different types of digital documents, this paper utilizes the principal component analysis (PCA) to develop an algorithm for automatic document classification. Based on PCA, representative keywords of distinct document categories can be obtained. Furthermore, according to the frequencies of representative keywords in the target document, the category of the target document can be determined. In addition to the document classification algorithm, a Web-based document classification system is also developed and a demonstration case is applied to verify the performance of the proposed approach. The attempt of this research is to enhance the accuracy and efficiency of enterprise document classification technology and to enable a self-service knowledge management mechanism in organizations.

並列關鍵字

無資料

參考文獻

14.涂富祥，2002，「運用軟式計算技術發展一個基於Ontology架構之Q&A系統」，碩士論文（指導教授：郭耀煌、郭淑美），國立成功大學資訊工程研究所。

23.陳景揆，1999，「探勘中文新聞文件中的概念關聯及趨勢」，碩士論文（指導教授：許中川），雲林科技大學資訊管理研究所。

34.楊正銘，2003，「以文字探勘技術應用於疾病分類之輔助系統-以出入院病歷摘要為例」，碩士論文（指導教授：劉立），臺北醫學大學醫學資訊研究所。

41.蔡純純，2002，「中文新聞文件空間資訊擷取之研究—以火災、搶劫、車禍事件為例」，碩士論文（指導教授：朱子豪），國立臺灣大學地理環境資源學研究所。

48.鍾明強，2003，「基於Ontology架構之文件分類網路服務研究與建構」，碩士論文（指導教授：郭淑美），國立成功大學資訊工程研究所。

被引用紀錄

楊俊寬（2011）。臺灣大專學生棒球隊運作現況調查研究〔碩士論文，國立臺灣師範大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0021-1610201315252274

國際替代計量

以主成分分析法為基礎之文件自動分類模式

全文下載

主題瀏覽