透過您的圖書館登入
IP:3.147.42.168
  • 學位論文

以索引值導向為基礎具高效率的新網格群集演算法

An Index Value Oriented Scheme on Efficient Grid-based Clustering Algorithm

指導教授 : 蔡正發

摘要


由於資訊科技發展蓬勃,資料數量增加的速度日益成長,面對大量的資料數據,如何從中獲取重要的規則及資訊是相當重要的課題,而資料探勘(Data Mining)是挖掘資料集所含之有用資訊的重要技術之一,因此,若提出的演算法能適用於大型資料庫那將會是相當有價值的技術。本論文提出的新演算法IVOS是基於網格式架構的新技術,為避免傳統網格式演算法中重複搜尋的網格方式,本論文運用別於傳統網格式演算法的合併及擴散方式,並導入索引值的概念,以提升分群效率,其中主要提出改良的流程可以分成四個部分:(1) 上方網格為無效網格、(2)上方網格為有效網格、(3)將索引值導回邊界值、(4)多群集合併。從實驗結果中可以得知,IVOS演算法在時間成本上均比其它方法快1.5倍以上,而分群正確率及雜訊濾除率也皆在99%的水準之上。

並列摘要


Data mining is one of the most significant techniques for mining useful information from datasets, which has become a challenging issue for scholars to investigate efficiency and performance improvement. Thereby, the algorithm that can be employed to big data will be a valuable technique. This paper proposed an Index Value Oriented Scheme(IVOS) algorithm based on grid clustering. The algorithm applied merging and spreading methods different from traditional grid algorithms, and searching approaches that can reduce repetition in order to improve clustering efficiency. The main improvements are as follows. (1) The top grids are invalid. (2) The top grids are valid. (3) The index values are deduced to boundary values. (4) Multi-clusters are merged. According to the simulation results, the proposed IVOS is faster than the other algorithms involving CLIQUE, ANGEL, GCCR and TING. Moreover, the proposed algorithm has at least 99% of clustering correctness rate and noise filtering rate.

參考文獻


[2] 林英盛,一個建立於網格式具高效能及高效率的群聚演算法,國立屏東科技大學資訊管理所碩士論文,2012。
[3] 張志豪,一個使用空間交會凝聚技術之有效率的網格式分群演算法,國立屏東科技大學資訊管理所碩士論文,2012。
[5] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan,P., “Automatic subspace clustering of high dimensional data for data mining applications,” Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 94-105, 1998.
[8] Karypis, G., Han, E.H., Kumar, V., “Chameleon: Hierarchical clustering using dynamic modeling,” IEEE Computer, vol. 32, no. 8, pp. 68-75, 1999.
[11] Tsai, C.F., Yen, C.C, “ANGEL: A new effective and efficient hybrid clustering technique for large databases,” Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 817-824, Springer, Heidelberg, 2007.

延伸閱讀