透過您的圖書館登入
IP:18.116.62.45
  • 學位論文

運用徑向基函數類神經網路在癌症基因選擇之研究

A Study on Gene Selection for Cancer Classification Using Radial Basis Function Network

指導教授 : 高成炎

摘要


人類專家們期望可以使用微陣列資料去判斷一個病人是否有癌症,或是能從中找出跟癌症有關聯的基因。然而微陣列資料有非常多的特徵(基因),比方說,人類有兩萬多個已知基因。這不但對人類專家們來說,很難去從中去找出潛在決定癌症的法則,而且對機器學習工具也是一個很大的考驗。因此我們需要有一個方法來根據這些基因的重要性做排名,以此來篩選出有決定性的基因。這樣一來,不但人類專家們可以花費較少心力去專研挑選出來的基因,並且也能夠加強機器學習工具在癌症分類的準確率。 這篇論文中,我們探討特徵選擇法對使用基因微陣列資料在癌症分類上的影響,主要是針對徑向基函數類神經網路的研究。我們的實驗顯示,無需參數調整的徑向基函數類神經網路能與參數調整最佳化的支援向量機在癌症分類上有相近的準確度,而且遠快於需調整最佳化的支援向量機。若使用特徴選擇法,則徑向基函數類神經網路相對於支援向量機有較多的準確度增進。 在特徴選擇法的研究中,我們也發現基因雜訊對徑向基函數類神經網路比支援向量機有較大的影響,因此我們提出一個新的特徵選擇法,快速徑向基函數類神經網路特徵遞迴刪除法。我們的實驗顯示,快速徑向基函數類神經網路特徵遞迴刪除法,對於增進癌症分類的準確率跟支援向量機特徵遞迴刪法有相近的效果。我們在生物相關文獻中也發現由快速徑向基函數類神經網路特徵遞迴刪除法選出的基因,例如基因Bcl-xl 在淋巴瘤,基因CXCL10 在前列腺癌,確實與癌症有關係,而這些基因是統計特徵選取法和支援向量機特徵遞迴刪除法很難選出來的。我們也在文中探討為何不同的特徵選擇法會選擇不同的基因。我們希望經由本研究,可以在癌症研究上提供另一種可能性。

並列摘要


Human experts hope to use microarray data to know if a patient has a caner and to identify genes associated with cancer. However, a microarray data has many features (genes), for example, human has more than twenty thousand genes. It is not only a difficult task for human to discover pattern in the microarray data but also a problem for machine learning methods. Therefore, we need to rank the importance of these genes in microarray data in order to select informative genes. And it could not only help human experts to research what genes lead to cancer but also help machine learning methods to increase the accuracy in cancer classification. In this thesis, we studied the impact of feature selection methods on cancer classifier with DNA microarray data sets, especially on radial basis function network (RBF network). The experiment showed that RBF network could achieve similar accuracy with optimized support vector machine (SVM) in much less computing time. By using feature selection methods, RBF network could has more improvement than SVM in cancer classification accuracy. During the research of feature selection, we observed that noisy genes could affect RBF network more than SVM. We, therefore, proposed a feature selection method, QuickRBF-RFE. QuickRBF could rank the importance of genes by itself and we could select a subset of discriminate genes by recursive feature elimination algorithm. Our experiment result showed that QuickRBF-RFE had similar performance with SVM-RFE in cancer classification. Moreover some of the top genes identified by QuickRBF-RFE, such as Bcl-xl in lymphoma cancer, CXCL10 in prostate cancer, were clarified to be associated with cancer in biological literature, which were difficult to be identified by statistical feature selection methods and SVM-RFE. Moreover we discussed why various feature selection methods would select different genes for cancer classification. We hope our research could open a new direction in cancer research.

參考文獻


[7] C. H. Li. Cancer classification with evolutional radial basis function network. Master’s thesis, National Taiwan University, 2005.
[1] J. Y. Song, J. K. Lee, N. W. Lee, H. H. Jung, S. H. Kim, and K. W. Lee. Microarray analysis of normal cervix, carcinoma in situ, and invasive cervical cancer: identification of candidate genes in pathogenesis of invasion in cervical cancer. Int J Gynecol Cancer, 2008. PMID: 18217980.
[2] F. Ezgu, A. Hasanoglu, I. Okur, G. Biberoglu, L. Tumer, T. Eminoglu, and H. Dogan. Rapid screening of 10 common mutations in turkish gaucher patients using electronic dna microarray. Blood Cells, Molecules, and Diseases, November 2007. PMID: 18035560.
[3] K. R. Calvo, L. A. Liotta, and E. F. Petricoin. Clinical proteomics: from biomarker discovery and cell signaling profiles to individualized personal therapy. Bioscience reports, 25:107–25. PMID: 16222423.
[4] A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics (Oxford, England), 21:631–643, 2005.

延伸閱讀