透過您的圖書館登入
IP:3.147.49.182
  • 期刊

IG-GA: A Hybrid Filter/Wrapper Method for Feature Selection of Microarray Data

並列摘要


Gene expression profiles have great potential as a medical diagnostic tool since they represent the state of a cell at the molecular level. Available training data sets for classification of cancer types generally have a fairly small sample size compared to the number of genes involved. This fact poses an insurmountable problem to some classification methodologies due to training data limitations. Feature selection is considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in acceptable classification accuracy. Hence, selecting relevant genes from the microarray data poses a formidable challenge to researchers due to the high-dimensionality of features, multi-class categories being involved, and the usually small sample size. To overcome this difficulty, a good selection method for genes relevant for sample classification is needed in order to improve prediction accuracy, and to avoid incomprehensibility due to the large number of genes investigated. In this paper, we proposed a filter method (information gain, IG) and a wrapper method (genetic algorithm, GA) for feature selection in microarray data sets. IG was used to select important feature subsets (genes) from all features in the gene expression data, and a GA was employed for actual feature selection. The K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) served as an evaluator of the IG-GA. The proposed method was applied and compared to eleven classification problems taken from the literature. Experimental results show that our method simplifies the number of gene expression levels effectively and either obtains higher classification accuracy or uses fewer features compared to other feature selection methods.

延伸閱讀