以索引值導向為基礎具高效率的新網格群集演算法

由於資訊科技發展蓬勃，資料數量增加的速度日益成長，面對大量的資料數據，如何從中獲取重要的規則及資訊是相當重要的課題，而資料探勘（Data Mining）是挖掘資料集所含之有用資訊的重要技術之一，因此，若提出的演算法能適用於大型資料庫那將會是相當有價值的技術。本論文提出的新演算法IVOS是基於網格式架構的新技術，為避免傳統網格式演算法中重複搜尋的網格方式，本論文運用別於傳統網格式演算法的合併及擴散方式，並導入索引值的概念，以提升分群效率，其中主要提出改良的流程可以分成四個部分：(1) 上方網格為無效網格、(2)上方網格為有效網格、(3)將索引值導回邊界值、(4)多群集合併。從實驗結果中可以得知，IVOS演算法在時間成本上均比其它方法快1.5倍以上，而分群正確率及雜訊濾除率也皆在99%的水準之上。

關鍵字

網格式分群；資料探勘；資料分群

並列摘要

Data mining is one of the most significant techniques for mining useful information from datasets, which has become a challenging issue for scholars to investigate efficiency and performance improvement. Thereby, the algorithm that can be employed to big data will be a valuable technique. This paper proposed an Index Value Oriented Scheme(IVOS) algorithm based on grid clustering. The algorithm applied merging and spreading methods different from traditional grid algorithms, and searching approaches that can reduce repetition in order to improve clustering efficiency. The main improvements are as follows. (1) The top grids are invalid. (2) The top grids are valid. (3) The index values are deduced to boundary values. (4) Multi-clusters are merged. According to the simulation results, the proposed IVOS is faster than the other algorithms involving CLIQUE, ANGEL, GCCR and TING. Moreover, the proposed algorithm has at least 99% of clustering correctness rate and noise filtering rate.

並列關鍵字

data clustering ； data mining ； grid-based clustering

參考文獻

[2] 林英盛，一個建立於網格式具高效能及高效率的群聚演算法，國立屏東科技大學資訊管理所碩士論文，2012。

[3] 張志豪，一個使用空間交會凝聚技術之有效率的網格式分群演算法，國立屏東科技大學資訊管理所碩士論文，2012。

[5] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan,P., “Automatic subspace clustering of high dimensional data for data mining applications,” Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 94-105, 1998.

[8] Karypis, G., Han, E.H., Kumar, V., “Chameleon: Hierarchical clustering using dynamic modeling,” IEEE Computer, vol. 32, no. 8, pp. 68-75, 1999.

[11] Tsai, C.F., Yen, C.C, “ANGEL: A new effective and efficient hybrid clustering technique for large databases,” Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 817-824, Springer, Heidelberg, 2007.

國際替代計量

以索引值導向為基礎具高效率的新網格群集演算法

全文下載

主題瀏覽