數字型資料分群法之整合

本論文研究數字型資料分群法之整合。我們延伸反覆迭代式成對整合的方法(IPC)，提出了三種數字型資料分群結果整合法。反覆迭代式成對整合法為一種類似最大期望值之演算法，須計算點與點間的相似程度，亦即計數兩點在諸多輸入分群結果中被分至同群的次數。在我們提出的第一個方法中，我們對各輸入分群結果中被分至同群的點，使用主成分分析，以更精確的計算點與點在各輸入分群法內的相似程度;然後定義兩點間的整體相似程度為兩點在各輸入分群法相似程度的總合，也就是對於被分至同群的配對做再一次確認兩者相似。在我們提出的第二個方法中，我們同樣對各輸入分群結果中被分至同群的點使用主成分分析，但我們反過來計算不被分至同群的點間的相似程度，然後定義兩點間的整體相似程度為計數兩點在各分群法被分至同群的次數與不被分至同群時的相似程度總合，也就是對不被分至同群的配對給予翻身的機會。在我們提出的第三個方法，我們延伸第一個方法，將其計算出的相似矩陣做為輸入，執行“頻譜分群法之相似矩陣整合”，計算各輸入分群法的權重。最後再以第一個方法經加權後的相似程度做為分群整合依據。

關鍵字

分群法整合；數字型

並列摘要

We propose three clustering aggregation methods for numerical-type data. These three methods are based on the idea of the iterative pairwise consensus (IPC) method. IPC is an Expectation-Maximization-like (EM-like) method which maximizes the sum of similarity measures between pairwise data points. In IPC method, the similarity between a pair of data points was defined as the average number of partitions in which the paired points are assigned to the same clusters. Our first proposed method, i.e. similarity-confirming clustering aggregation (SCfCA), apply PCA to the data points who are assigned to the same cluster in each partition, and then modify the similarity measure of coupled data points. SCfCA defines the overall similarity measure as the average similarity of pairwise data points in each partition. Our second method, i.e. similarity-compensation clustering aggregation (SCpCA) method, also apply PCA to the data points which are assigned to the same cluster in each partition. However, we also evaluate the similarity of pairwise data points which are assigned to different cluster in each partition. SCpCA defines the overall similarity measure of pairwise data points as the sum of the average number of partitions where the paired points are assigned to the same clusters and the average similarity of pairwise data points which are assigned to different cluster in each partition. Our third method, i.e. the weighted similarity-confirming clustering aggregation (WSCfCA) method, uses the similarity matrices of partitions conducted by SCfCA as input. Then, we use the affinity aggregation for spectral clustering (AASC) to get the weight of each partition. Finally, we use the weighted similarity matrix to aggregate the clustering results.

並列關鍵字

clustering aggregation ； numerical ； evidence accumulation

參考文獻

[1] N. Nguyen and R. Caruana, “Consensus clusterings.” Proceedings of IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, pp. 607-612, 2007.

[2] H.-C. Huang, “Affinity aggregation for spectral clustering.” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, pp. 773-780, 2012.

[3] A. Strehl and J. Ghosh, “Cluster ensembles - a knowledge reused framework for combining partitionings.” Journal of Machine Learning Research, Vol. 3, pp. 583-617, 2002.

[5] X. Wang, C. Yang, and J. Zhou, “Clustering aggregation by probability accumulation.” Pattern Recognition, Vol. 42, pp. 668-675, 2009.

[6] A. Fred and A.K. Jain, “Robust data clustering.” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, MADISON, WI, pp. 128-133, 2003.

國際替代計量

數字型資料分群法之整合

全文下載

主題瀏覽