未知群數之可重新分配的分裂導向模糊分群

集群分析(Clustering Analysis)為一種資料分析方法，主要是要將特徵相似的資料聚集到同一群，將資料群體分門別類，協助我們對資料做進一步應用。此方法十分實用，所以被許多學者廣泛應用到各領域。模糊集群演算法模糊C平均值(fuzzy c-means, FCM)演算法在分群問題上有很好的成效，但需事先知道群數，無法自動產生最佳群數。而階層式集群方法(hierarchical clustering )，可事先不知道群數進行分群，但是所執行的時間複雜度高，因此有學者[M. Steinbach, G. Karypis, V. Kumar, 2000]提出二分集群法(Bisecting k-means clustering) ，在分群過程中，不僅可解決群數問題，並且速度較快，但依其二分的特性，一開始若分錯接下來的步驟就很難再修補。因此，本研究目的在為不知資料集群數目的情形之下，在分群過程中，自動決定最適當的群數，並能產生好的集群結果。利用分裂式階層集群方法(divisive hierarchical clustering)以及二分法的特性，與模糊集群的模糊C平均值(Fuzzy c-Means，FCM)演算法結合之後，再利用群集聚合(Cluster merging)方式提高分群品質。本研究提出三種可重新分配的分裂導向模糊分群演算法，分別為「模糊二分法」、「聚合模糊二分法」、「模糊分裂法」，而最後經由實驗結果顯示，本研究所提出的方法，使得未知群數的資料分群獲得與已知群數相近的群數與正確率。

關鍵字

未知群數；集群分析；階層式集群；模糊集群；二分集群法；群集聚合

並列摘要

Cluster analysis is one kind of data mining. Its main feature is “A cluster is a collection of data objects that are similar to one another within the same cluster and dissimilar to the objects in other clusters” [J. Han and M. Kamber, 2001]. Clustering is a useful method to discover the hidden information, so it is applied in many fields, such as DNA analysis, Business Intelligence (BI), Computing Intelligence, etc. Fuzzy c-means (FCM) algorithm has been good performance, but it still has some drawbacks. For example, it must get the predefined number of clusters in advance as traditional k-means. However in the real world application, the cluster number usually can’t be known beforehand. Hierarchical clustering doesn’t need to know the number of clusters in advance, but its time complexity is very high. Researchers [M. Steinbach, G. Karypis, V. Kumar, 2000] offer the Bisecting k-means clustering to resolve the problem on cluster number and receive lower time complexity than original hierarchical clustering. But this method still suffers from the fact that once a step is done, it can never be undone which causes a big problem that some data been partitioned into wrong clusters in the previous stage won’t be relocated in following stages. Therefore, the purpose of this research is to automatically select the optimal number of clusters and achieve high accuracy. This research integrates the property of bisecting and divisive hierarchical clustering with Fuzzy c-means algorithm, and finally merges the high similar clusters, which maybe wrong partitioned into different clusters, to improve the clustering quality. We present three enhanced relocatable split-oriented fuzzy clustering algorithms called Bisect FCM, Merging Bisect FCM and Split FCM. The experimental results show that our approaches are not only efficient in convergence speed but also effective in clustering quality. They can obtain the correct or near the cluster number and achieve even higher accuracy than the FCM with cluster number known beforehand.

並列關鍵字

Cluster analysis ； Unknown cluster number ； Hierarchical clustering ； Fuzzy c-means clustering ； Bisecting k-means ； Cluster merging

參考文獻

[19] 潘麒全，「可修正的二分群集法」，中原大學資訊管理所碩士論文，2003年6月。

[2] Frigui, H., and Krishnapuram, R., “Clustering by Competitive Agglomeration”, Pattern Recognition, Vol.30, No.7, 1997, pp.1109-1119.

[3] Frigui, H., and Krishnapuram, R., “A Robust Competitive Clustering Algorithm with Applications Computer Vision”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.21, No.5,1999, pp.450-465.

[4] Han, J. and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2001

[7] Kaymak, U., and Setnes, M., “Fuzzy cluster with Volume Prototypes and Adaptive Cluster Merging”, Proceedings of 2002 IEEE International conference on Fuzzy Systems, Vol.10, 2002, pp.705-712.

被引用紀錄

陳志豐（2008）。基於高頻項目集結合近似樣式匹配之文件分群〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-0607200917245203

陳寶燦（2010）。應用分群技術於同義書目之過濾與最佳化〔碩士論文，大同大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0081-3001201315105100

國際替代計量

未知群數之可重新分配的分裂導向模糊分群

未授權

主題瀏覽