透過您的圖書館登入
IP:18.223.158.21

並列摘要


This study proposed a new method about clustering in documents. Clustering is a very powerful data mining technique for topic discovery from documents. In document clustering, it must be more similarity between intra-document and less similarity between intra-document of two clusters. The cosine function measures the similarity of two documents. When the clusters are not well separated, partitioning them just based on the pair wise is not good enough because some documents in different clusters may be similar to each other and the function is not efficient. To solve this problem, a measurement of the similarity in concept of neighbors and links is used. In this study, an efficient method for measurement of the similarity with a more accurate weighting in bisecting k-means algorithms is proposed. Having evaluated by the data set of documents, the efficiency is compared with the cosine similarity criterion and traditional methods. Experimental results show an outstanding improvement in efficiency by applying the proposed criterion.

延伸閱讀