基於群聚及統計測試的概念漂移檢測方法

串流資料挖掘是現今應用中常見的數據挖掘方法之一。然而，由於現實世界中的串流資料的性質，特別是概念漂移，令它具有挑戰性。為了處理概念漂移，當數據標籤不可用時，漂移檢測方法是必要的。在本文中，我們提出了一種基於統計測試的漂移檢測方法，其中以群聚演算法分群作前處理，並透過主成分分析（PCA）進行特徵提取減少資料維度以縮短執行時間。在合成和真實串流資料集的實驗結果表明，群聚前處理提高了漂移檢測和特徵提取的性能，從而提高了檢測性能，並加快了執行時間。

關鍵字

概念漂移；漂移檢測

並列摘要

Stream data mining is one of the common data mining methods in real-world applications nowadays. However, it is challenging due to the nature of data stream in real-world, especially concept drift. To handle concept drift, drift detection method is necessary when the accessing data label is unavailable. In this paper, we propose a drift detection method based on the statistical test with clustering as preprocessing and reduce the execution time with principal component analysis (PCA) for the feature extraction method. Experiment result on synthetic and real-world streaming data show the clustering preprocessing improve the performance of the drift detection and feature extraction trade-off an insignificant performance of detection for great speed up for the execution time.

並列關鍵字

concept drift ； stream data mining ； drift detection ； unsupervised

參考文獻

[1] A. Haque, L. Khan and M. Baron, Semi Supervised Adaptive Framework for Classifying Evolving Data Stream. Cham: Springer International Publishing, 2015, pp. 383-394. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-18032-8_30

[3] J.a. Gama, I. Žliobaitė, A.Bifet, M. Pechenizkiy, and A. Bouchachia, “A Survey on Concept Drift Adaptation,” ACM Comput. Surv., vol. 46, no. 4, pp. 44:41-44:37, Mar. 2014. [Online]. Available: http://doi.acm.org/10.1145/2523813

[5] A. Bifet and R. Gavaldà, Learning from Time-Changing Data with Adaptive Windowing. SIAM, 2007, pp. 443-448. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/1.9781611972771.42

[6] P. Domingos and G. Hulten, “Mining high-speed data streams,” in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD '00. New York, NY, USA: ACM, 2000, pp. 71-80. [Online]. Available: http://doi.acm.org/10.1145/347090.347107

[7] J. Gama, P. Medas, G. Castillo and P. Rodrigues, Learning with Drift Detection. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 286-295. [Online]. Available: http://dx.doi.org/10.1007/978-3-540-28645-5_29

國際替代計量

基於群聚及統計測試的概念漂移檢測方法

全文下載

主題瀏覽