透過您的圖書館登入
IP:3.19.31.73
  • 學位論文

以遺傳基因體資料推導基因調控網路及預測複雜表現型

Gene Regulatory Network Inference and Complex Phenotype Prediction from Genetical Genomics Data

指導教授 : 趙坤茂

摘要


基因表現量與基因型資料近年來成指數成長,為了廣泛運用此類型資料於各項研究,統整分析遺傳基因體資料,並尋找顯著表現基因,已成為目前的趨勢。 此研究提出一套簡單的遺傳基因體資料分析流程,首先整合基因表現量與基因型資料,接著以隨機森林演算法進行特徵選取。這套流程被應用於兩個項目: 推導基因調控網路及預測複雜表現型。十五個分別含有一千個基因的基因調控網路被推導,我們以接收操作特徵曲線下的面積及精確與檢索率曲線下面積來評量推導結果。關於預測複雜表現型方面,此套流程可被用來預測大豆的抗病能力,我們以斯皮爾曼等級相關係數來評量預測結果。實驗結果顯示,不論在模擬或真實的遺傳基因體資料,此套分析流程的效果都優於其他方法。整合基因表現量與基因型資料是分析遺傳基因體資料的關鍵步驟。此外,隨機森林演算法是一個找出顯著表現基因的理想方法。

並列摘要


The amount of gene expression pro ling and genotype data have grown exponentially. To apply them for extensive studies, integrated analysis of genetical genomics data becomes a trend and thus identifying relevant genes of specifc response is an essential issue. We propose a simple workflow for genetical genomics data analysis which includes integration of genotype and gene expression data as well as Random Forest feature selection. The proposed workflow is utilized in two applications: gene regulatory network inference and complex phenotype prediction. Fifteen different gene networks composed of one thousand genes respectively are reconstructed. Area under Receiver Operator Characteristic curve and Precision-Recall curve are measured for inference performance. For the other application, disease susceptibility of soybean plants are predicted. Spearman's rank correlation coefficient is used for prediction evaluation. Results show that our method outperforms other methods in both simulated and real genetical genomics data. Integration of genotype and gene expression is a pivotal step in genetical genomics data analysis. And Random Forest is an ideal way to find out relevant genes for further applications.

參考文獻


[1] M. Ackermann, M. Clement-Ziza, J. J. Michaelson, and A. Beyer. Teamwork: improved eqtl mapping using combinations of machine learning methods. PLoS One, 7(7):e40916, 2012.
[2] A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769):503–511, 2000.
[3] M. Bhattacharjee and M. J. Sillanpaa. A Bayesian mixed regression based prediction of quantitative traits from molecular marker and gene expression data. PLoS One, 6(11):e26959, 2011.
[4] H. Bolouri. Computational modelling of gene regulatory networks: a primer. World Scientific, 2008.
[5] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

延伸閱讀