頻繁子空間之資料探勘

隨著資料維度以及資料量增加，以全部資料維度為主的分群方法仍有很大的改善空間。因此，近來子空間分群的方法愈來愈受到重視。在本篇論文中，我們提出一個創新的子空間探勘方法，它同時能看到所有頻繁子空間的全貌。我們所提出的方法包括三個步驟。首先，我們將所有的資料點投影到二維空間，並產生許多頻繁子空間；然後，我們將這些頻繁子空間結合，形成更大的頻繁子空間；最後，我們採用貪婪演算法做總結，從所產生的所有頻繁子空間中選出重要的子空間。實驗結果顯示，我們提出的方法在品質和涵蓋率方面，皆優於FIRES和DUSC這兩個方法。

關鍵字

資料探勘；子空間探勘；子空間分群；頻繁子空間；貪婪演算法；品質；涵蓋率

並列摘要

As both the number of dimensions and the amount of data increase, existing clustering methods in the full feature space are not good enough to cluster the data in databases. Thus, the subspace clustering has attracted more and more attention recently. In this thesis, we proposed a novel subspace mining method which can simultaneously consider all frequent subspaces to select the significant subspaces. The proposed method consists of three phases. First, we project all data points onto each pair of dimensions and generate frequent subspaces. Second, we join frequent subspaces to form larger ones. Finally, we adopt a greedy algorithm to summarize the frequent subspaces found and select the significant subspaces. The experimental results show that our proposed method outperforms the FIRES and DUSC methods in terms of quality and coverage.

並列關鍵字

data mining ； subspace mining ； subspace clustering ； frequent subspace ； greedy algorithm ； quality ； coverage

參考文獻

[14] M. Glomba, and U. Markowska-Kaczmar, IBUSCA: A grid-based bottom-up subspace clustering algorithm, In Proceedings of the Sixth International Conference on Intelligent Systems Design and Application, 2006, pp. 671-676.

[2] C. C. Aggarawal and P. S. Yu, Finding generalized projected clusters in highdimensional spaces, In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000, pp. 70-81.

[3] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data, In Proceedings of ACM SIGMOD International Conference on Management of Data, 1998, pp. 94-105.

[4] I. Assent, R. Krieger, E. Müller, and T. Seidl, DUSC: Dimensionality unbiased subspace clustering, In Proceedings of the Seventh IEEE International Conference on Data Mining, 2007, pp. 409-414.

[5] C. Baumgartner, C. Plant, K. Kailing, H.-P. Kriegel, and P. Kröger, Subspace selection for clustering high-dimensional data, In Proceedings of the Fourth IEEE International Conference on Data Mining, 2004, pp. 11-18.

國際替代計量

頻繁子空間之資料探勘

全文下載

主題瀏覽