透過您的圖書館登入
IP:3.135.185.194
  • 學位論文

應用資訊增益、簡化群體演算法及輪式搜尋策略於基因選取之研究

Information gain and wheel based simplified swarm optimization for gene selection from gene expression data

指導教授 : 葉維彰

摘要


在過去幾年間,特徵選取方法已被廣泛使用,特徵選取是被運用在分類問題中一種縮減維度的工具,它的目的是找出資料中最具有鑑別力的特徵,同時提升分類的準確率。特徵選取能夠降低資料的雜訊與演算成本,尤其是當資料量龐大的時候,效果更為可觀;所以,特徵選取非常適合應用在高維度且雜訊多的真實資料上,例如醫學上可以利用此一方法來篩選出可能導致癌症的重要基因,並提升癌症的鑑別率,此時,特徵選取稱為基因選取。基因選取可以幫助醫生及早發現並及早治療癌症以提升治癒率。本篇研究透過10組癌症的基因資料集建立一個有效的基因選取模型;此一模型結合資訊增益、簡化群體演算法以及輪式搜尋策略形成完整之基因選取方法。首先,我們利用資訊增益將冗餘的基因剃除;其次,將剩餘基因利用柔性運算的簡化群體演算法以及輪式搜尋策略找出真正具有鑑別度的少數基因。在演算法選擇基因的過程中,利用支持向量機器搭配留一交叉驗證來計算準確率。為驗證演算法效能,我們將本研究提出的演算法與過去文獻提出之方法做比較與討論;結果顯示,本研究提出的資訊增益搭配簡化群體演算法及輪式搜尋策略的基因選取模型能夠在選取較少基因的情形下達到更高的準確率。

並列摘要


Recently, feature selection has been an important issue in data mining problems. The object of feature selection is to find the most distinguished features among datasets which have enormous number of features and then improve the classification accuracy. Feature selection can reduce the noise and save lots of time and costs for researchers, especially when the volume of data is huge. Feature selection has wide applications for high dimensional real world situations such as cancer research in medical field. When feature selection is being used in cancer research to find cancerous genes, it is called “gene selection”. With gene selection, doctors can find the symptoms or signs of cancer at early stage and enhance the survival rate. In this paper, we try to develop an effective gene selection model for ten benchmark gene expression datasets. We proposed an information gain and wheel-based simplified swarm optimization (IG-WSSO) to solve the problem. Initially, we used information gain (IG) to remove irrelevant genes. Then, we conducted simplified swarm optimization with the wheel based search strategy for gene selection (WSSO). Support vector machine (SVM) with leave one out cross validation (LOOCV) was adopted to evaluate the accuracy. We compared our algorithm, IG-WSSO, with previous research by running ten benchmark datasets of gene expression data, which can be downloaded on: http://www.gems-system.org/. The results show IG-WSSO can achieve higher classification accuracy by selecting less number of genes.

並列關鍵字

無資料

參考文獻


[1] Guyon, I., Weston, J. and Barnhill, S., Gene selection for cancer classification using support vector machines, Machine Learning, 2002. 46(1-3): p. 389-422
[2] Li, X.T. and Yin, M.H., Multi-objective Binary Biogeography Based Optimization for Feature Selection Using Gene Expression Data, IEEE Transactions on Nano-Bioscience, 2013. 12(4): p. 343-353.
[3] Mohamad, M.S., Omatu, S., Deris, S. and Yoshioka, M., A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes from Gene Expression Data, IEEE Transactions on Information Technology in Biomedicine, 2011. 15(6): p. 813-822.
[5] Saeys, Y., lnza, I. and Larrañaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 2007. 23(19): p. 2507-2517.
[6] Kar, S., Sharma, K.D., Maitra, M., Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique, Expert Systems with Application, 2014. 42(1): p. 612-627.

延伸閱讀