透過您的圖書館登入
IP:3.141.27.244
  • 學位論文

基於多個相似度矩陣之資料分群演算法

A Data Clustering Algorithm Based on Multiple Pairwise Similarity Matrices

指導教授 : 林守德

摘要


本論文之目標在於解決資料分群 (data clustering) 問題,為資料探勘領域中的經典問題之一。不同於以往使用特徵矩陣傳統解法,本論文考慮每筆資料兩兩之間的相似程度的資料矩陣作為分群之依據。為了使其更加一般化,亦納入了考慮多個相似度矩陣之情形。 對於此問題,本論文首先對於目前各種可行的解法做出統整與分析,並提出了一個基於非負矩陣分解的新模型——MDSC,來解決目標問題。所提出的模型之最主要的概念在於,藉由給定的多個相似度矩陣,使用低秩逼近產生合理的分群勢能矩陣。相較於其他矩陣分解類型的模型,本論文所提出的 MDSC 使用了對稱三項分解以學習每個維度所帶有的資訊。其中,矩陣分解的隱性矩陣為所有維度共用,並且被用來代表最後所求的分群勢能矩陣;對稱三項分解之中央矩陣則被用來作為將分群勢能矩陣調整於各個維度的分佈之用途。 對於所提出的模型 MDSC,本論文亦提供了一個最佳化架構以學習最佳的模型參數。實驗結果顯示本論文所提出的方法於公開之應用資料上的分群表現較其他已知的方法更佳。另外,本論文也分析了所提出之模型於實際應用時的收斂性以及在稀疏矩陣上的表現。

並列摘要


This thesis aims at solving the data clustering problem given only pairwise similarity information, different from many conventional clustering methods that assume the availability of instance features. To make it more general, we consider the existence of multiple similarity matrices as the input. We propose a non-negative matrix factorization model, MDSC, to handle this task. The main concept is to learn a low-rank representation for the partition potentials among all the similarity matrices. Different from other matrix factorization models, MDSC is formulated as a series of non-negative symmetric tri-factorizations to approximate the similarity matrix of each dimension. In particular, the latent matrices are regarded as the clustering potential matrix and shared among all the dimensions. To fit every similarity matrix using the same potential matrix, the middle matrices are introduced to adjust the bases and to absorb noises. An optimization framework is established to learn the model parameters. The experimental results show that the proposed model outperforms the existing approaches on real-world datasets in terms of three common clustering performance metrics. We also show that our model converges fast and can reach similar quality of results using less than 10% of the data.

並列關鍵字

Data Clustering Machine Learning Data Mining

參考文獻


[3] J.-P. Brunet, P. Tamayo, T. R. Golub, and J. P. Mesirov. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the national academy of sciences, 101(12):4164–4169, 2004.
[4] D. Cai, X. He, J. Han, and H.-J. Zhang. Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing, 15(11):3608–3614, 2006.
[5] N. F. Chikhi. Multi-view clustering via spectral partitioning and local refinement. Information Processing & Management, 52(4):618–627, 2016.
[6] C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 126–135. ACM, 2006.
[7] C. H. Ding, X. He, and H. D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In SDM, volume 5, pages 606–610. SIAM, 2005.

延伸閱讀