透過您的圖書館登入
IP:3.133.156.156
  • 學位論文

於資料串流上基於動態網格的分群演算法

Clustering Evolving Data Stream Based on Dynamic Grids

指導教授 : 李素瑛

摘要


近年來,由於資訊科學的發展和相關設備的進步,資料串流已成為普遍的資料型態。如何在無限且動態的資料串流上進行分群,並擷取出有意義的資料特徵,此問題已經引起重大的關注。雖然在此議題上已有相當的研究發表,多數的方法都需要在起始時給予適當的參數設定。然而在資料串流上,與一般靜態的資料不同,其資料特徵與分群資訊是動態而不穩定的,因此在起始的參數設定相當困難。處理資料串流是一個連續的程序,在不同時刻也可能需要不同的參數設定,固定參數的方法往往在其資料特徵改變時無法正確的反映與處理。本篇提出一個新穎的演算法,DGBC (動態網格分群法),用來對資料流進行分群。在過程中,該方法可以自動的調整所需要的參數,用以對應最新的資料與分群特徵。在合成資料和真實資料兩者上所進行的實驗結果均顯示 DGBC 不僅擁有較快的執行速度,所產生的分群結果也有較高的品質,同時對於起始參數的敏感度也較低。

關鍵字

分群 資料串流

並列摘要


Clustering multi-dimensional data stream is a difficult and important problem. The goal is to cluster the objects within the stream continuously, to discover and monitor the evolving up-to-dated events. Density grid based clustering algorithms are fast, and can discover arbitrarily shaped clusters and deal with noise. However, the sizes and borders of the grids easily influence Grid-based algorithms. We propose a Dynamic Grid-Based Clustering algorithm for high-dimensional data streams. When new data arrives, the grid structure is dynamically updated. Dynamic grid structures adjust its range and boundary on each dimension over time to produce effective clustering results with low memory usage. We used both synthetic and real data set for experiments, and the experimental results show that our proposed algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data stream

並列關鍵字

stream clustering density grid

參考文獻


[1] Amineh Amini, Teh Ying Wah, Mahmoud Reza Saybani,and Saeed Reza Aghabozorgi Sahaf Yazdi ,"A study of density-grid based clustering algorithms on data streams", Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on Date of Conference, vol.3, pp. 1652-1656.
[4] Charu C. Aggarwal, Jiawei Han, Jianyong Wang, and Philip S. Yu, "A framework for projected clustering of high dimensional data streams", Very Large Data Base Endowment '04, pp. 852-863.
[5] P. S. Bradley, O. L. Mangasarian, and W. N. Street, "Clustering via Concave Minimization," in Advances in Neural Information Processing Systems, vol. 9, 1997, pp. 368–374.
[6] Feng Cao, Martin Ester, Weining Qian, and Aoying Zhou, “Density-based clustering over an evolving data stream with noise,” in SIAM Conference on Data Mining, 2006, pp. 328-339.
[9] Kyungmin Cho, SungJae Jo, Hyukjae Jang, Su Myeon Kim, and Junehwa Song, "DCF: An Efficient Data Stream Clustering Framework for Streaming Applications". DEXA, 2006, pp. 114-122

延伸閱讀