透過您的圖書館登入
IP:18.226.222.12
  • 學位論文

頻繁子空間之資料探勘

Mining Frequent Subspaces

指導教授 : 李瑞庭

摘要


隨著資料維度以及資料量增加,以全部資料維度為主的分群方法仍有很大的改善空間。因此,近來子空間分群的方法愈來愈受到重視。在本篇論文中,我們提出一個創新的子空間探勘方法,它同時能看到所有頻繁子空間的全貌。我們所提出的方法包括三個步驟。首先,我們將所有的資料點投影到二維空間,並產生許多頻繁子空間;然後,我們將這些頻繁子空間結合,形成更大的頻繁子空間;最後,我們採用貪婪演算法做總結,從所產生的所有頻繁子空間中選出重要的子空間。實驗結果顯示,我們提出的方法在品質和涵蓋率方面,皆優於FIRES和DUSC這兩個方法。

並列摘要


As both the number of dimensions and the amount of data increase, existing clustering methods in the full feature space are not good enough to cluster the data in databases. Thus, the subspace clustering has attracted more and more attention recently. In this thesis, we proposed a novel subspace mining method which can simultaneously consider all frequent subspaces to select the significant subspaces. The proposed method consists of three phases. First, we project all data points onto each pair of dimensions and generate frequent subspaces. Second, we join frequent subspaces to form larger ones. Finally, we adopt a greedy algorithm to summarize the frequent subspaces found and select the significant subspaces. The experimental results show that our proposed method outperforms the FIRES and DUSC methods in terms of quality and coverage.

參考文獻


[14] M. Glomba, and U. Markowska-Kaczmar, IBUSCA: A grid-based bottom-up subspace clustering algorithm, In Proceedings of the Sixth International Conference on Intelligent Systems Design and Application, 2006, pp. 671-676.
[2] C. C. Aggarawal and P. S. Yu, Finding generalized projected clusters in highdimensional spaces, In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000, pp. 70-81.
[3] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, Automatic subspace clustering of high dimensional data, In Proceedings of ACM SIGMOD International Conference on Management of Data, 1998, pp. 94-105.
[4] I. Assent, R. Krieger, E. Müller, and T. Seidl, DUSC: Dimensionality unbiased subspace clustering, In Proceedings of the Seventh IEEE International Conference on Data Mining, 2007, pp. 409-414.
[5] C. Baumgartner, C. Plant, K. Kailing, H.-P. Kriegel, and P. Kröger, Subspace selection for clustering high-dimensional data, In Proceedings of the Fourth IEEE International Conference on Data Mining, 2004, pp. 11-18.

延伸閱讀


  • Lee, A. J. T., Lin, M. C., Wang, Y. R., & Chen, K. T. (2010). 重要子空間之資料探勘. 資訊管理學報, 17(), 27-49. https://doi.org/10.6382/JIM.201012.0027
  • Cheng, C. W. (2009). 頻繁子空間之分類器 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2009.01592
  • 顏秀珍、李御璽、鄭力瑋、林俊達、陳煜堃(2018)。在資料串流中探勘頻繁序列型樣。載於國立東華大學(主編),NCS 2017 全國計算機會議(頁54-59)。國立東華大學。https://doi.org/10.29428/9789860544169.201801.0011
  • Tsao, W. K. (2010). 點集合資料庫之時空樣式探勘 [doctoral dissertation, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2010.10731
  • Wang, L. (2010). Frequent itemsets mining on uncertain databases [master's thesis, The University of Hong Kong]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0029-1812201200018835

國際替代計量