應用多核學習理論於資料分群

資料分群是一個非常重要的科學問題，也十分廣泛的被應用在許多方面，以往多假設資料間只有單一的相似度量，但是在許多問題中，我們會有多個相似度量，這些度量反映資料間的不同面向，有些比較有效，有些則反而有害，不幸的是我們無法事先知道哪些度量是有效的。本論文將多核學習理論應用於資料分群問題上，可以自動將多個相似度量融合成單一度量，降低無效甚至有害度量的影響，從而得到優於任何一個單一度量所能得到的分類結果。其中模糊聚類演算法是一種流行的軟資料分群方法，其效力主要限於球形集群。透過核函數技術的幫助，核函數模糊聚類演算法試圖藉由經過非線性關係的投影，將資料轉換到適合的空間上來解決此問題。如果有效地組合或選擇核函數是核函數模糊聚類演算法成功的關鍵。不幸的是，對於大多數的問題，它不容易找到合適的組合。我們提出一個多核函數模糊聚類演算法（MKFC），它將模糊聚類演算法延伸與多核函數學習設置結合。通過結合多個核函數，並自動調整核函數的比重，多核函數模糊聚類演算法（MKFC）能避免選到無效或無關的核函數，這對於如何選擇核函數是相當重要的。此外，我們將呈現多核函數聚類演算法（MKKM）是多核函數模糊聚類演算法（MKFC）的一種特殊情況。在合成和現實世界資料上的實驗，證明多核函數模糊聚類演算法（MKFC）對於資料分群的確是有效的。譜聚類演算法是一種硬資料分群方法，它利用光譜圖的結構將資料分成數個有意義的群組。譜聚類演算法已經成為一種有效地做資料分群的方法。傳統的譜聚類演算法只使用一種相似度量，然而在許多問題中，我們可能有很多相似度量可用。不幸的是我們不知該用什麼方法將多個相似度量結合成一個，胡亂地組合這些不同的相似度量可能無法得到良好的資料分群結果。我們提出三種將譜聚類演算法延伸與多個相似度量結合的方法，每一種都能找到合適的比重來結合不同的相似度量，避免選到無效或無關的相似度量。實驗結果證明在使用多個相似度量的情況下，這三種方法都能有效地將資料分群。

關鍵字

資料分群；軟聚類演算法；模糊聚類演算法；硬聚類演算法；譜聚類演算法；多核函數學習

並列摘要

Data clustering is a very important scienti c issue and widely used in many aspects. In the past, we usually assumed that only a single a nity between data. However, many real-world clustering applications, in which there are multiple potentially useful a nities. These a nities have di erent characteristics, some are e ective and some are harmful. Unfortunately, we can not know in advance which metric are more suitable for a particular task. In this thesis, we extend the multiple kernel learning paradigm to data clustering. By automatically fusing multiple a nities into the single one and reduce invalid even harmful a nities, we can get better results than that clustering methods using single-a nity. While fuzzy c-means is a popular soft clustering method, its e ectiveness is largely limited to spherical clusters. By applying kernel tricks, the kernel fuzzy c-means algorithm attempts to address this problem by mapping data with nonlinear relationships to appropriate feature spaces. Kernel combination, or selection, is crucial for e ective kernel clustering. Unfortunately, for most applications, it is not easy to nd the right combination. We propose a multiple kernel fuzzy c-means (MKFC) algorithm which extends the fuzzy c-means algorithm with a multiple kernel learning setting. By incorporating multiple kernels and automatically adjusting the kernel weights, MKFC is more immune to ine ective kernels and irrelevant features. This makes the choice of kernels less crucial. In addition, we show multiple kernel k-means (MKKM) to be a special case of MKFC. Experiments on both synthetic and real-world data demonstrate the e ectiveness of the proposed MKFC algorithm. Spectral clustering is a hard clustering method makes use of spectral-graph structure of an a nity matrix to partition data into disjoint meaningful groups. Because of its elegance, e ciency, and good performance, spectral clustering has become one of the most popular clustering methods. Traditional spectral clustering assumes a single a nity matrix. However, in many applications, there could be multiple potentially useful features and thereby multiple a nity matrices. To apply spectral clustering to these cases, a possible way is to aggregate the a nity matrices into a single one. Unfortunately, a nity measures constructed from di erent features could have di erent characteristics. Careless aggregation might contribute to even worse clustering performance. We propose three methods which extend spectral clustering to a setting with multiple a nities available. They are the tensor based spectral clustering algorithm (TBSC), the multiple-a nity spectral clustering (MASC) algorithm, and the a nity aggregation spectral clustering (AASC) algorithm. Each of them strives for an optimal combination of a nity matrices so that it is more immune to ine ective a nities and irrelevant features. This reduces the importance of the construction of similarity or distance-metric measures for clustering. Experiments show that these three algorithms e ective in simultaneous clustering and feature fusion, thus enhancing the performance of spectral clustering by employing multiple affinities.

並列關鍵字

Clustering ； soft clustering ； fuzzy c-means ； hard clustering ； spectral clustering ； multiple kernel learning

參考文獻

[12] J.-H. Chiang and P.-Y. Hao. A new kernel-based fuzzy clustering approach:

recognition in personal photo albums. In Proceedings of the IEEE International

Conference on Computer Vision and Pattern Recognition (CVPR), pages 1{7,

[2] A. Antoniou and W.-S. Lu. Practical Optimization: Algorithms and Electrical

[3] F. R. Bach and M. I. Jordan. Learning spectral clustering. In Advances in

國際替代計量

應用多核學習理論於資料分群

主題瀏覽