具「由下而上」樹狀結構的重疊影像分群演算法

方格基底分群演算法的最大優點在於運算速度快，卻往往受限於方格大小及方格密度的參數設定。在本篇論文中，我們提出的演算法，不僅可以較傳統的方格分群演算法提供更容易尋找到正確分群結果所需要的方格大小及方格密度參數範圍，並且繼承方格基底分群演算法運算速度快的優點。我們所提出的重疊影像分群演算法，係利用座標平移所產生的兩組方格系統來避免因方格邊界所造成錯誤分群的可能，並且增加分群的正確率。並且定義三種層級的方格類型(significant, semi-significant, non-significant)透過「由下而上」的樹狀結構的四層節點與運算(leaf node level-significant cells, connection level-pre-clustering, combination level- semi-significant cells, root-set of clusters),以由下而上階層式的運算，將方格系統中的重要方格進行連結與組合，產生最佳的分群結果。

關鍵字

方格基底分群演算法；方格系統；重要方格；半重要方格；重疊影像；下而上樹

並列摘要

The grid-based clustering algorithm is an efficient clustering algorithm, but its effect is seriously influenced by the size of the predefined grids and the threshold of the significant cells. The data space will be partitioned into a finite number of cells to form a grid system and then performs all clustering operations on this obtained grid system. A critical problem needed to be noticed here, no one knows the distribution of the clustering data in each dimension. So, a method to reduce the influence of the size of predefined cells and the density threshold of significant cells is a must. To cluster efficiently and simultaneously, to reduce the influences of the size of the cells and inherits the advantage with the low time complexity, we propose “A Crossover-Imaged Clustering Algorithm with Bottom-up Tree Architecture” in this dissertation. The data space is separating into cells which are organized the first grid system, and the data space is transformed into the new grid system. The chosen density threshold and some predefined cells are utilized to identify the significant cells and semi-significant cells. With our clustering algorithms, the size of significant cell and its density threshold are easy to select to group the significant cells into ideal clusters. Next, the data cells are shifted to define the new grid system. Then the semi-significant cells are selected to group the significant cells and other semi-significant cells to be clusters. Then, we build the bottom-up tree architecture with four levels, root (set of clusters), combination (semi-significant cells), connection (pre-clusters), leaf nodes (significant cells), and define the semi-significant cells to group the clusters. Finally we will verify by experiment that the results of our proposed algorithms can reduce the influence of the size of predefined cells and the density threshold of significant cells. And the proposed algorithms still inherit the advantage with the low time complexity and require at most one single scan through the data and one cell clustering.

並列關鍵字

grid-based clustering algorithms ； grid system ； significant cell ； semi-significant cell,Crossover-Image ； Bottom-up Tree

參考文獻

[ 1] M. Ester, H. Kriegel, J. Sander, and X. Xu. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, In Proc. of 2nd Int. Conf. on KDD, 1996, pages 226-231.

[ 2] A. Hinneburg and D. A. Keim,. “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, In Knowledge Discovery and Data Mining, 1998, pages 58-65.

[ 3] ANKERST M. etc. “OPTICS: Ordering Points to Identify the Clustering Structure.” In Proc. ACM SIGMOD Int. Conf. on MOD, 1999, pages 49-60.

[ 4] Wang, Yang, R. Muntz, Wei Wang and Jiong Yang and Richard R. Muntz “STING: A Statistical Information Grid Approach to Spatial Data Mining”, In Proc. of 23rd Int. Conf. on VLDB, 1997, pages 186-195.

[ 5] G. Sheikholeslami, S. Chatterjee, and A. Zhang. “WaveCluster: a wavelet-based clustering approach for spatial data in very large databases”, In VLDB Journal: Very Large Data Bases, 2000, pages 289-304.

國際替代計量

具「由下而上」樹狀結構的重疊影像分群演算法

全文下載

主題瀏覽