應用資料探勘分類技術進行文件推薦 
－以博碩士論文系統為例

摘要在網站大量成長的情形下，網路上的資料量也急劇的成長。要如何的幫助使用者更快速的找尋到所需要的資料或是將這麼大量的資料以方便使用者閱讀的方式呈現給使用者，已經成為一個重要的問題。本研究利用網頁探勘的技術，提出一種簡單可行的使用者導向之文件分群方法，只需要網站日誌檔中有關於使用者使用關鍵字與瀏覽紀錄即可達到文件分群的功能。以這樣的方式來進行資料的群集可以節省維護詞庫檔與處理文字段詞、統計詞類等等的人力與時間。而且透過研究提出之方法所產生的文件群集可以更直接的反應使用者的興趣與偏好。此外，以研究所提出的分群方法分群後的群集可以有一些使用者所使用的關鍵字詞來描述所分群產生的結果。透過分析觀察也發現，這樣的分群方法所產生之群集有一定程度的可描述性(正確性)。

關鍵字

群集分析；文件分群；網頁探勘；資料探勘

並列摘要

Abstract In the case of a large number of websites growth, the amount of the data on the network will also grow dramatically. It has become an important issue to help users quickly find the information needed or to such large amounts of data in order to facilitate users to read and presented. In the case study of web mining technology, it presents a simple and practical method for user-oriented document clustering. In order to achieve the goal of document clustering, it only needs the site log files of which keywords and browsing history on the users end. In such way of clustering the data, can save the manpower and time when maintaining the thesaurus files, text segments, and parts of speech. In addition, according to the category that was identified by researched data base. Those categories also can be identified by the key words of the users. After observation’s analysis, those categories which identified by user’s key words, are at high level of accuracy percentage.

並列關鍵字

cluster analysis ； document clustering ； web mining ； data mining

參考文獻

4. 顧皓光，「網路文件自動分類」，台灣大學資訊管理研究所碩士論文，民國86 年6 月。

9. Kowalski, G., “Information retrieval systems: theory and implementation,”Kluwer Academic Publishers, 1997.

1. 董惟鳳，「應用資料探勘方法建構適性化資訊網站」，私立輔仁大學資訊管理研究所碩士論文，2000，六月。

Google Scholar

2. 陳光華，「引文索引之建置與應用」，文華圖書館管理資訊股份有限公司，民國94年6月。

Google Scholar

3. 蔡明月，「資訊計量學與文獻特性」，華泰文化事業公司，民國91年6月。

Google Scholar

國際替代計量

應用資料探勘分類技術進行文件推薦－以博碩士論文系統為例

未授權

主題瀏覽