應用支持向量機於癌症微陣列資料識別

微陣列是一個現今十分重要的基因分析工具，他可以協助分別多種的癌症類別。我們進行了一個癌症微陣列資料的識別工作，在這個工作中，我們運用了資訊科學的特徵選擇方法和支持向量機的機器學習方法，來進行將資料簡化和資料預測的工作。我們將這兩樣的工具運用在三種的癌陣微陣列資料上，分別是白血病、肺癌和前列腺癌。我們運用的特徵選擇方法主要有兩類的方法，分別是距離測量法類的歐式距離特徵選擇法和相依性測量法類的皮爾森相關係數特徵選擇法。我們運用支持向量機在不同的特徵個數和三種不同的核函式，來進行分類的工作。而我們的結果顯示出距離式特徵選擇法是適合支持向量機分類器的特徵選擇法，且線性核函式在我們所進行的這三種問題來說是較佳的核函式。在這三組資料不同的特徵個數中，將至少7129個特徵數量，減少至僅15到100個特徵個數之間的狀況下，仍然能夠獲得了相等或較佳的預測結果。

關鍵字

癌症分類；微陣列；支持向量機；特徵選擇；皮爾森相關係數

並列摘要

Microarray is an important tool in gene analysis research. It can help identify genes that might cause various cancers. In this thesis, we use feature selection methods and the support vector machine (SVM) to search for the disease-causing genes in microarray data of three different cancers. The feature selection methods are based on Euclidian distance (ED) and Pearson correlation coefficient (PCC). We selected three most reference microarray data sets for classification which are AML & ALL data sets, Lung cancer data sets, and Prostate data sets. We investigated the effect on prediction results by training the SVM with different numbers of features and different kinds of kernels. The results show that linear kernel is the fittest kernel in this issue. Also, equal or higher accuracy can be achieved with only 15 to 100 features which are selected from 7129 or more features of the original data sets.

並列關鍵字

Cancer Classification ； Microarray ； Support Vector Machine ； Feature Selection ； Pearson Correlation Coefficient

參考文獻

[1] Margaret Gardiner-Garden and Timothy G. Littlejohn, “A Comparison of Microarray Databases,＂ Briefings in Bioinformatics, Vol. 2, No 2, May 2001, pp. 143-158.

[4] V. Vapnik, I Guyon, J. Weston, S. Barnhill, “Gene Selection for Cancer Classification using Support Vector Machines,＂ Machine Learning, Vol. 46, No. 1-3 Jan. 2002, pp. 389-422.

[5] J. Zhang, R. Lee, Y. J. Wang, “Support vector machine classifications for microarray expression data set,＂ IEEE International Conference on Computational Intelligence and Multimedia Applications (ICCIMA) 2003, 27 -30 Sep. 2003, pp. 67-71.

[8] M. Dash, H. Liu, “Feature Selection for Classification,＂ Intelligent Data Analysis, Vol. 1, No. 3, Mar. 1997, pp. 131-156.

[10] S. Cho and J. Ryu, “Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features,＂ PROCEEDINGS OF THE IEEE, Vol. 90, No. 11, Nov. 2002, pp. 1744-1753.

被引用紀錄

邱劉中（2010）。都會區小客車駕駛人對車載資通訊服務之消費認知與市場區隔研究〔碩士論文，淡江大學〕。華藝線上圖書館。https://doi.org/10.6846/TKU.2010.00730

國際替代計量

應用支持向量機於癌症微陣列資料識別

全文下載

主題瀏覽