透過您的圖書館登入
IP:3.137.220.120
  • 學位論文

植基於直方圖方法之有效率階層式分群法

An Efficient Hierarchical Clustering Algorithm Based on Histogram Method

指導教授 : 謝淑玲

摘要


由於資訊科技不斷發展,在人類生活當中累積了許許多多的數據資料,如何從這些資料中挖掘出令人感興趣的資訊,便是資料探勘的主要工作。分群方法是資料探勘技術中相當重要的方法,一直以來都有許多學者進行研究並開發新的分群演算法。 本論文針對資料探勘技術中之分群方法,提出一個新的分群演算法,命名為Hierarchical Thresholding Clustering Algorithm (HTCA)。透過統計資料點數值分佈情形之直方圖,藉著影像二值化方法中的大津二值化法(Otsu’s method)尋找最佳分割點,將整個資料集逐漸切割成數個群集,是屬於階層式分群方法的一種。依照分割順序,也可建立出分割二元樹,以利後續資料分析之用。 根據實驗結果,我們所提出之 HTCA 分群法在部分實驗資料集中,其準確率可以更勝某些知名演算法,然而HTCA 分群法應用於大型資料集的分析中,也可以省下許多運算時間,確實地提高了分群工作的效率。

並列摘要


Clustering is a very important method in Data Mining techniques. Development of information technology, a lot of dataset had accumulated in life. How to find more interesting information from these data is the main work of The Data Mining. Many scholars have been research and development for the new clustering algorithms. In this paper, a new clustering algorithm is proposed. The new clustering algorithm is a hierarchical clustering algorithm named HTCA clustering method that based on histogram method and Otsu bi-level thresholding. According to the splitting order make a splitting binary tree to facilitate for subsequent data analysis. From the experimental results, HTCA method got a higher accuracy than some popular clustering method as k-means. And HTCA method to save a lot of computing time in some large data set as Abalone data set. In some cases, HTCA clustering algorithm was really improve the efficiency of the clustering.

參考文獻


Barricelli, N. A. (1963). Numerical testing of evolution theories. Acta Biotheoretica, 16(3), 99-126. doi: 10.1007/bf01556602
Dorigo, M., Birattari, M., & Stutzle, T. (2006). Ant colony optimization. Computational Intelligence Magazine, IEEE, 1(4), 28-39. doi: 10.1109/mci.2006.329691
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM Comput. Surv., 31(3), 264-323. doi: 10.1145/331499.331504
Johnson, S. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241-254. doi: 10.1007/bf02289588
Li, C., & Biswas, G. (2002). Unsupervised learning with mixed numeric and nominal data. Knowledge and Data Engineering, IEEE Transactions on, 14(4), 673-690. doi: 10.1109/tkde.2002.1019208

延伸閱讀