架構在混合分配模型上之模糊聚類演算法

由兩位作者Banfield 與 Raftery 在西元1993 年所提出的高斯與非高斯模式聚類法，關於此模式聚類法迄今已經被廣泛地研究並應用在許多的領域裡。模糊分割本是由硬式分割上的擴展而且是一種較為穩健的聚類方法，因此我們在此論文中建構了一個相對應於模糊分割的模糊模式的聚類架構法。首先我們考慮模糊分群最大概似模組的共變異矩陣所做的特徵值分解，接著我們使用貝氏資訊準據去選出一個最佳的模組與最好的分群數。我們所提出的模糊模式的聚類架構能夠做出穩健的聚類，使能找到最適分類數的穩健性，以及偵測出分類群與聚類群間的群體差異。在這篇論文中我們展示了我們所提出的演算法運用在一些數值資料與實際資料的聚類，也證實了我們的方法的有效性及優越於其他方法的地方。然後我們接著提出了如何解決離群點所造成聚類法的穩健問題，其實我們都知道高斯分配在於處理有離群點資料上不夠穩健，事實上，T 分配相較於高斯分配在處理離群點的問題是較為穩健的。所以在這篇文章中，我們進一步考慮把多維的T 分配代入所提的模糊模式的聚類架構法，之後我們亦展示出此法的穩健性在處理數值與現實資料跟運用其他聚類法所得到的卓越效果。

關鍵字

離群點；高斯分配； t分配；穩健；模糊；聚類

並列摘要

Since Banfield and Raftery (1993) proposed model-based Gaussian and non-Gaussian clustering, the model-based clustering has been widely studied and applied in various areas. Because fuzzy partition is an extension of hard partition as a more robust way for clustering, we construct a fuzzy model-based clustering framework via fuzzy partition in this article. We first consider the eigenvalue decomposition of a covariance matrix in a fuzzy classification maximum likelihood model. We then use the Bayesian information criterion for model selection to choose a best model with an optimal number of clusters. Therefore, the proposed fuzzy model-based clustering framework exhibits robust clustering characteristics with the robustness to cluster number and also to cluster volumes in capability to detect different volumes of clusters. Some numerical examples and real data applications with comparisons are given to demonstrate the effectiveness and superiority of the proposed model. We next consider to handle the robust problem of outliers. We know that the Gaussian distribution is not robust for outliers. In general, t-distributions should be more robust to outliers than Gaussian distributions. In this dissertation, we further consider the proposed fuzzy model-based clustering framework with multivariate t-distributions. Some numerical and experimental examples are used to make comparisons, and the results demonstrate its robustness with multivariate t-distributions.

並列關鍵字

model-based clustering ； t distribution ； Gaussian distribution ； robust ； fuzzy ； clustering

參考文獻

[1] J.A. Hartigan, Clustering Algorithms, New York: Wiley, 1975.

[2] L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York: Wiley, 1990.

[3] R. Duda, P. Hart, Pattern Classification and Scene Analysis, New York: Wiley, 1973.

[5] G.J. McLachlan, K.E. Basford, Mixture Models: Inference and Applications to clustering,New York: Marcel Dekker, 1988

[7] J. MacQueen, Some methods for classification and analysis of multivariate observations, Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp.281-297, Berkeley: University of California Press, 1967.

國際替代計量

架構在混合分配模型上之模糊聚類演算法

全文下載

主題瀏覽