Rand與Jaccard指標的擴展及其在聚類整合評估上的應用

在此篇論文中，我們將針對相似度的測量做三個部分的研究。第一部分是相似度的模糊推廣和Rand指標模糊推廣之比較。為了順利將Rand指標和其他指標從硬式分割推廣到模糊分割，我們提出利用圖形與相關矩陣去轉換隸屬度矩陣成為符號關係矩陣。結果發現：利用兩個符號關係矩陣相乘後的矩陣trace可以求出相似度，例如：Rand指標, Jaccard指標等。此方法與以前學者所做過Rand指標的模糊推廣做比較，最大不同是我們的方法具有如下重要特色：對於任意的模糊分割矩陣及而言, = 是與的模糊Rand指標等於1的充分且必要條件。這重要特色使我們的Rand指標模糊推廣(fuzzy generalization of Rand index,FGRI) 和其他相關指標的模糊推廣不只能計算模糊分割與硬式參考分割之間的相似度，而且也能辨識模糊分割與模糊參考分割之間的相似度，甚至也能比較不同資料集與相同模糊參考分割的相似度。第二部分是擴大FGRI的使用領域，這包含處理下面幾種情況：聚類整合與硬式分割之間的相似度，兩個聚類整合之間的相似度，一個模糊聚類整合與硬式分割之間的相似度，一個模糊聚類整合與一個聚類整合之間的相似度，一個模糊聚類整合與模糊分割之間的相似度，兩個模糊聚類整合之間的相似度。第三部分是定義一個compromised similarity index( )去改善相對樂觀的Rand指標與相對保守的Jaccard指標。我們也提出權重參數的選擇方式。更進一步地，我們還將此觀念引進模糊領域(即：定義fuzzy , )去計算:模糊分割與硬式參考分割之間的相似度,模糊分割與模糊參考分割之間的相似度。結果發現：我們提出的指標在實際應用上是兼具彈性與合理性。最後，為了闡述本論文提出方法之重要特性、合理性與實務性，數值分析的比較與實驗結果將在每一部分被討論。

關鍵字

硬式分割；模糊分割； Rand指標； Jaccard指標；聚類整合；相似度；妥協權重

並列摘要

In this thesis, we shall focus on three parts for the similarity measures. The first part contains the fuzzy generalization for similarities and comparisons for fuzzy extensions of Rand index. To generalize Rand index and other related indices from crisp partitions to fuzzy partitions, we propose a graph and the relation matrices to convert a membership matrix into a sign relation matrix. Our result shows that the trace of sign relation matrix multiplication can calculate the following similarity: Rand index, Jaccard index, etc. Compared with previous fuzzy generalizations for Rand index, the most unique aspect of our method has the following important characteristics that, for any two fuzzy partition matrices and , the result with = is the sufficient and necessary condition for the result that the fuzzy Rand index is equal to 1. This important characteristic renders our fuzzy generalized of Rand index(FGRI)and other related indices not only able to determine the similarities between fuzzy partitions and crisp reference partitions, but also to identify the similarity between fuzzy partitions and fuzzy reference partitions. The method can even be used to explore and compare the similarities between various data sets and the same fuzzy reference partition. The second part is that we use FGRI to broaden the scope of RI to consider other scenarios so that it can treat the similarities of the following situations: between a cluster ensemble and a crisp partition, between two cluster ensembles, between a fuzzy cluster ensemble and a crisp partition, between a fuzzy cluster ensemble and a cluster ensemble, between a fuzzy cluster ensemble and a fuzzy partition, between two fuzzy cluster ensembles. The third part is that we define a compromised similarity index ( ) to improve the relatively optimistic Rand index and the relatively conservative Jaccard index. We also provide the weight parameter selection. Furthermore, we advance this concept into fuzzy extension (that is to say, we also define a fuzzy compromised similarity index, ) so that it can be used to measure similarities between fuzzy partitions and crisp reference partitions and those between fuzzy partitions and fuzzy reference partitions. From the results, the proposed indices are more flexible and reasonable to provide a useful way that can be applied in practical studies according to actual demands. Finally, numerical comparisons and experimental results are used in every part to clarify the key properties, rationality, and practicality of the proposed methods.

並列關鍵字

crisp partition ； fuzzy partition ； Rand index ； Jaccard index ； cluster ensemble ； similarity ； compromised weight

參考文獻

and possibilistic partitions, IEEE Trans. Fuzzy Systems 18 (2010) 906-918.

[2] P. Arabie, S.A. Boorman, Multidimensional scaling of measures of distance between

partitions, J. Math. Psychol. 10 (1973) 148-203.

[3] H. Ayad, M.S. Kamel, Cumulative voting consensus method for partitions with variable number of clusters, IEEE Trans. Pattern Analysis and Machine Intelligence 30 (2008) 160–173.

[4] H.G. Ayad, M.S. Kamel, On voting-based consensus of cluster ensembles, Pattern Recognition 43 (2010) 1943-1953.

國際替代計量

Rand與Jaccard指標的擴展及其在聚類整合評估上的應用

全文下載

主題瀏覽