應用主成份分析及支持向量機於特徵擷取之研究

研究上指出資料本身的特性會直接影響到分類能力。因此我們設計出一種資料研究的方法，將特徵做最好的應用。在一些模糊不清的資料上增加必要的特徵，提高模式識別的應用，以保證類別分離性。本論文結合主成份分析(PCA)於特徵擷取之研究。因此我們提出了LPCSVM和FCLSVM兩種演算法。在LPCSVM演算法中，外部的類別標籤被視為有用的特徵資訊並將其加入原始資料中，而形成一個新增加的資料集。對於支持向量機(SVM)而言，主成份分析的功能是擷取這些新資料的特徵。在FCLSVM演算法中，我們討論相同的類別標籤觀念，將第一主成份當作一個代表性的指標於增加的資料集中。如此，這些代表性的第一主成份可以成功經由數學式計算而被呈現;且於分類之前, 對於任何驗證與測試資料也能做相同的轉換。實驗數據顯示，應用代表性指標資料，分類誤差將會被降低。這結果證實代表性的指標提供給特徵擷取額外有價值的資訊。

關鍵字

主成份分析；支持向量機

並列摘要

Several studies have been reported that the characteristics of data sets are directly correlated with the capability of the classifier. Therefore, a study in the cognition is conceived, and we suggest the feature optimization. It adds necessary features based on some vague and insufficient knowledge in the pattern recognition applications to guarantee class separability. We present that the available resource of class labels and feature extraction concepts of principal component analysis (PCA) can be applied to the feature optimization problem. Thus, we propose the LPCSVM and FCLSVM to set a sufficient number of features compensating for the lack of information. In the LPCSVM algorithm, the class labels of outputs firstly are regarded as useful feature information, and thus they are incorporated into the original inputs to form a new augmented data set. Then principal component analysis (PCA) is applied to the augmented data to extract features for support vector machines (SVM) classification. Above all, in the FCLSVM algorithm we discuss the concept of an equivalent class label, which describes this first principal component as a kind of representative label in the augmented data set. In this way, the representative indices can be successfully represented by a mathematical function in the first principal component form, which is benefiting any validation set and test set subjected to the same transformation before it is classified by the classifier. The experiments on several existing data sets show that, when the augmented data are utilized, the classification errors estimated are reduced by experimental evidence. This implies that the class labels can be used as extra helpful information to feature extraction.

並列關鍵字

support vector machines ； principal companent analysis

參考文獻

[1] V. N. Vapnik, The nature of statistical learning theory., Springer-Verlag, Berlin Heidelberg, New York, 1995.

[2] V. N. Vapnik, “An overview of statistical learning theory”, IEEE Transaction on Neural Networks, Vol. 10, pp 988-999, 1999.

[3] V. N. Vapnik, Statistical learning theory, Wiley, New York, 1998.

[6] C. F. Lin and S. D. Wang, “Fuzzy support vector machines”, IEEE Transactions on Neural Networks, Vol. 13, pp 464-471, March 2002.

[7] C. F. Lin and S. D. Wang, “Training algorithms for fuzzy support vector machine with noisy data”, Pattern Recognition Letters Archive, Vol. 25, pp 1647-1656, 2004.

國際替代計量

應用主成份分析及支持向量機於特徵擷取之研究

未授權

主題瀏覽