透過您的圖書館登入
IP:18.217.203.172
  • 學位論文

修改K-means演算法應用在距離矩陣為基礎的分類

A Modify K-means algorithm for distance matrix based clustering

指導教授 : 阮議聰
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


中文摘要 在這篇論文中我們提出一個Modify K-means的方法應用在比對後的距離矩陣資料上。一般來說我們常見的叢集演算法大致上分為Partitioning與Non- partitioning(Hierarchical)兩種方式去做資料的分類。我們使用Partition的方式中最廣為人知的演算法K-means,修改並讓其適用於距離矩陣的資料,並比較兩者之間的差異。在處理資料方面,能對距離矩陣為基礎的資料作分類;在執行效率上,除了收斂的次數比K-means來的少外,利用儲存第一次計算的花費來修正每一次所需計算的資料量,以至於在執行的效率上能趨近K-means的速度甚至能比K-means優越。而能對多種資料作有效分析的演算法,如non-Partitioning方式中的凝聚法(agglomerative method),雖然能處理多樣化的資料型態,可是卻沒有顯示出叢集的代表點,且資料一但分配過亦不能再重新分配也流於時間的複雜而不具價值。因此我們將所提出的方法延伸,使分割出來的叢集也能呈現出階層式的表示,方便使用者觀察分割完成後物件的差異來加強資料的實用性。在此我們發現,Hierarchical這方法所呈現的階層圖是所有物件的結合差異,但是當資料量大時,我們並不需要將所有物件的差異都得知後再去做處理,只需得到所欲分割的叢集差異即可。基於此種理由,我們利用Partition的方式先對資料作分割,再將分割結果以Hierarchical方式求算叢集間的差異來達到上述的目的。最後將此兩種方法應用在C程式比對後的對稱矩陣實例上。

並列摘要


Abstract In this paper we propose a Modify K-means method for distance matrix based clustering. Usually, clustering methods are classified into two categories of Partitioning and Hierarchical. A well-known method of Partitioning method is K-means algorithm. This method can’t deal with the distance matrix based data set, so we modify this method for distance matrix based data and compare the difference between these two methods. Convergence frequency in our method is much lower than K- means method and we improve the time complexity of our method in each iteration, so that the efficiency of our method is close to K-means, even can be better. Although the Hierarchical methods can deal with the distance matrix based data, it does not show the feature of clusters and the time complexity is too high. We extend our method to display the result with Hierarchical diagram, user can observe the difference of the clusters. And then we propose another method to combine Partitioning with Hierarchical method. First, we use the Partitioning method to clustering the data. Second, we deal the result of first phase with Hierarchical method. At last we apply these two methods to distance matrix data, which represent the difference of C program codes.

並列關鍵字

clustering distance matrix K-means

參考文獻


[5] Richard J. Hathaway, Member, IEEE, James C. Bezdek, Fellow, IEEE, and Yingkang Hu ,“Generalized Fuzzy c-means clustering strategies Using Lp Norm Distance,” IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 8, NO. 5,pp.576-582, OCTOBER 2000
[7] Bezdek, J.C. and R.J. Hathaway ,”Numerical convergence and Interpretation of the Fuzzy c-shells clustering algorithm,” IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL.3, NO. 5,pp.787-793, SEPTEMEMBER,1992,
[8] A.B. Geva ,”Hierarchical Unsupervised Fuzzy clustering,“ IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO. 6,pp.723-733 ,DECEMBER 1999
”Clustering with a Genetically Optimized Approach,” IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 3, NO. 2,pp.103-112, JULY 1999
[10] K. Krishna and M. Narasimha Murty ,“Genetic K-Means Algorithm,” IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 29, NO. 3,pp.433-439, JUNE 1999

被引用紀錄


Wang, J. J. (2008). 使用獨立成份分析法和減法聚類分類法分析老鼠主運動區的ENG訊號 [master's thesis, National Taipei University of Technology]. Airiti Library. https://doi.org/10.6841/NTUT.2008.00312
曾鈺棠(2015)。運用K-means改善情境感知框架中共識模型之決策結果〔碩士論文,中原大學〕。華藝線上圖書館。https://doi.org/10.6840/cycu201500693
羅益祥(2009)。轉換型模糊C-均值演算法應用於皮膚病與鳶尾花的分群〔碩士論文,亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916272271

延伸閱讀