建立文獻識別字進行研究趨勢之探勘-以資料庫相關文獻為例

有效的知識管理機制對於目前資訊爆炸的年代而言非常重要。本研究針對資料庫相關文獻，透過分類的文獻群組之識別字頻率的計算與統計分析，並以出現頻率在兩類別文獻中差異最大的字作為主要識別字，利用這些主要識別字找出不同時間性及主題性類別文獻中研究趨勢所產生的變化為何。根據這些主要識別字的差異，可以建立對應之文獻分類器，以每個主要識別字在一文獻中出現的頻率來對該文獻評分，並利用所有主要識別字對該文獻的總分來判別其歸屬之類別。藉由此分類機制，不但可以看出主要的研究方向及目前最新的研究課題，對於歸類相關議題之文獻也很有幫助。此外，由於透過單一個識別字來判別文獻研究趨勢較為困難且不具體，本文提出以關聯法則對分類文獻所探勘出之主要識別字進行實驗，藉由找出關聯性較高的識別字組合來了解其在該領域所代表的議題，使得利用主要識別字探勘文獻研究趨勢的方法能夠更為精確。

關鍵字

文件分類；知識管理；資料探勘

並列摘要

The valid knowledge management mechanism toward the age of information explosion is very important. Our research aims at the abstract of literature for database relevance. Through scan of literatures and find the discriminating words which have the most different word frequency between two categories of literatures. We can apply it to text categorization, according to the frequencies of discriminating words found in the abstract. These discriminating words can be used to find out the change of research trend and to determine whether or not a given paper discusses the new topic in this domain. In addition, due to through a single discriminating word to look for research trend of literature is more difficult and not concrete. We put forward with association rule to mining the connection of these discriminating words. By word combination to observe which topic the discriminating words represent could be more clearly and accurately.

並列關鍵字

text categorization ； knowledge management ； data mining

參考文獻

3. Edward M. Marcotte, Ioannis Xenarios and David Eisenberg, Mining Literature for protein-protein interaction, Bioinformatics, Vol. 17, no. 4, pp. 359-363, 2001.

4. G. Salton and C. Buckley, Term weighting approaches in automatic text retrieval, Information Processing and Management, Vol. 24, No. 5, pp. 513-523, 1988.

6. G. V. Kass, An exploratory technique for investigating large quantites of data, Applied Statistics, Vol. 29, pp. 119-127, 1980.

7. J. R. Quinlan, C4.5: Programming for machine Learning, Morgan Kaumann, 1993.

8. J. T.-Y. Kwok, Automatic Text Categorization Using Support Vector Machine, Proc. Int. Conf. on Neural Information Processing, pp. 347-351, October 1998.

國際替代計量

建立文獻識別字進行研究趨勢之探勘-以資料庫相關文獻為例

主題瀏覽