隨著資料維度以及資料量增加,以全部資料維度為主的分群方法仍有很大的改善空間。因此,近來子空間分群的方法愈來愈受到重視。在本篇論文中,我們提出一個創新的子空間探勘方法,它同時能看到所有頻繁子空間的全貌。我們所提出的方法包括三個步驟。首先,我們將所有的資料點投影到二維空間,並產生許多頻繁子空間;然後,我們將這些頻繁子空間結合,形成更大的頻繁子空間;最後,我們採用貪婪演算法做總結,從所產生的所有頻繁子空間中選出重要的子空間。實驗結果顯示,我們提出的方法在品質和涵蓋率方面,皆優於FIRES和DUSC這兩個方法。
As both the number of dimensions and the amount of data increase, existing clustering methods in the full feature space are not good enough to cluster the data in databases. Thus, the subspace clustering has attracted more and more attention recently. In this thesis, we proposed a novel subspace mining method which can simultaneously consider all frequent subspaces to select the significant subspaces. The proposed method consists of three phases. First, we project all data points onto each pair of dimensions and generate frequent subspaces. Second, we join frequent subspaces to form larger ones. Finally, we adopt a greedy algorithm to summarize the frequent subspaces found and select the significant subspaces. The experimental results show that our proposed method outperforms the FIRES and DUSC methods in terms of quality and coverage.