透過您的圖書館登入
IP:3.16.46.131
  • 學位論文

非監督式特徵選擇:最小化特徵的資訊冗餘

Unsupervised Feature Selection: Minimize Information Redundancy of Features

指導教授 : 林守德

摘要


在本篇論文中,我們提出了一個非監督式特徵選擇的方式,從資料中移除多餘的特徵。主要的貢獻可以分成兩個部分:第一,我們依照平面對於資料中近乎線性相依的描敘能力,利用特徵分解(eigen-decomposition)來對這些平面的方程式排名。藉由高斯消去法,我們逐步地刪去那些可以被其它特徵取代的特徵。第二,我們證明了此方法接近於對資料進行主成分分析後,進而移除在不重要的主成分上所占比重較大的特徵。然而,我們更進一步的考慮了每次特徵被移除後,對其它特徵的影響。實驗證明我們的方法可以在一個已知特徵相依性的人為資料上,刪去那些和其它特徵相依的特徵。而對於真實世界的資料,相較於其它方法,我們的可以有更好的效果。

並列摘要


In the thesis, we propose an unsupervised feature selection method to remove the redundant features from a dataset. The major contributions are twofold. First, we propose an eigen-decomposition method to rank the hyperplanes (which describes the relations between features) based on their near linear dependency characteristic, and then design an efficient Gaussian-elimination method to one by one remove the feature that is best represented by the rest of the features. Second, we provide a proof showing that our method is similar to removing the features that contribute the most to the PCA components with the smallest eigenvalue, but considering the effect of each removal of features. We perform experiments on an artificial data set created by ourselves, and two other real-world data sets with different characteristics. The experiment show that our method can almost perfectly remove those dependent features without losing any independent dimension in the artificial set and outperforms two other competitive algorithms in the real-world dataset.

參考文獻


[2] Cadima, J., Jolliffe, I.T., “Loadings and correlations in the interpretation of principal components,” J. Appl. Statist, 22 (2), 203–214, 1995
[3] C. Ding and H.C. Peng, “Minimum Redundancy Feature Selection from Microarray Gene Expression Data,” Proc. Second IEEE Computational Systems Bioinformatics Conf., pp. 523-528, Aug. 2003
[4] Cumming, J. A. and Wooff, D. A., “Dimension reduction via principal
variables,” Computational Statistics & Data Analysis, 52(1): 550–565, 2007
[8] G. P. McCabe, “Principal variables,” Technometrics, vol. 25, no. 2, pp. 137-144, 1984

延伸閱讀