DK-means:一個新的使用於資料庫進行資料探勘之高穩定性分群技術

隨著資訊科技的進步與發展，儲存在資料庫中的資料也隨之成長。資料探勘技術能夠幫助挖掘出隱含在資料中的有用資訊並且廣泛地應用於各領域中，尤其是資料分群更是最常用的資料分析模式。資料分群在各種應用領域裡扮演著重要的角色。資料分群係描述資料在分群的運算過程，其中同一群組內的資料相似性高，然而不同群組內的資料則相似度低。通常我們是使用距離的測量來評估資料之非相似性（根據描述物件屬性的值）。資料分群演算法在最近幾年不斷被研發出來，其中K-means是快速、容易實作、並且可以找到資料分群的區域最佳解之方法。然而，K-means的主要缺點是難以去辨識任意形狀的圖形。本研究提出一個修正的K-means演算法，此演算法以距離觀念為基礎，可使資料分群的結果能夠較為穩定。經由模擬結果顯示本論文所提出的DK-means分群方法可產生良好精確的結果。

關鍵字

資料探勘；資料分群； K均值法

並列摘要

With the rapid progress of information technology, more and more amounts of data are produced and stored in the databases. Data mining helps to extract the useful information and be used widely in different areas, data clustering is an analytic mode that especially most frequent used. Data clustering plays an important role in various fields. Data clustering describes the process of grouping data into clusters such that the data in each cluster share a high degree of similarity while being very dissimilar to data from other clusters. Dissimilarities are evaluated according to the attribute values describing the objects. Usually, distance measures are used. Data clustering algorithms have been developed in recent years. K-means is fast, easily implemented and finds most local optima for data clustering. However, the crucial shortcoming of K-means is the difficultly of recognizing arbitrary shapes. This paper presents a modified k-means based on the concept of distance, and the proposed algorithm may enhance the stability in data clustering results. The simulation reveals that the proposed DK-means yields good accurate clustering results.

並列關鍵字

Data Mining ； Data Clustering ； K-Means

參考文獻

Bandyopadhyay, S.,Maulik, U.(2002).An evolutionary technique based on K-means algorithm for optimal clustering in RN.Information Sciences.146,221-237.

Google Scholar

Goldberg, D.E.(1989).Genetic Algorithms in Search, Optimization, and Machine Learning.MA:Addison-Wesley.

Google Scholar

Guha, S.,Rastogi, R.,Shim, K.(1998).CURE: An Efficient Clustering Algorithm for Large Data Bases.Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data.27(2),73-84.

Google Scholar

Guha, S.,Rastogi, R.,Shim, K.(1999).ROCK: A Robust Clustering Algorithm for Categorical Attributes.Proceedings of 15th International Conference on Data Engineering.(Proceedings of 15th International Conference on Data Engineering).