透過您的圖書館登入
IP:3.145.111.183
  • 學位論文

A Genetic Algorithm Based On Maximum Likelihood and Normalized Mutual Information to Infer Haplotypes from Genotypes

基於最大可能性與正規化交互資訊之基因演算法來從基因型推論單倍體

指導教授 : 蘇豐文

摘要


Haplotypes consist of blocks of single nucleotide polymorphisms (SNPs). Haplotypes being a unit of inheritance are widely used for association studies and gene candidate studies. However, obtaining these blocks of SNPs through in vitro methods is both time consuming and expensive. In silico studies try to infer haplotypes from genotypic data. This thesis utilizes a genetic algorithm (i.e. a heuristic approach) guided through two genetic models, essentially the Hardy-Weinberg equilibrium and linkage disequilibrium. These have been statistically assessed by maximum likelihood estimates and a normalized mutual information respectively. This technique generates an adequate solution in polynomial time to an inherently NP-Hard problem. The results showed that our algorithm has a better accuracy rate compared to a genetic algorithm that only utilizes the Hardy-Weinberg equilibrium.

並列摘要


單倍體基因型(Haplotypes)中包含有多組的單核苷酸多型性(single nucleotide polymorphisms). 而單倍體基因型(Haplotypes)做為遺傳研究中的一個單位,已被大量使用於相關遺傳與候選基因的研究當中. 然而,藉由試管實驗的研究方法來獲取這些單核苷酸多型性(SNPs)的資訊,不僅非常花時間,成本也非常高昂. 相反的,筆者嘗試透過電腦模擬的研究方法,藉由基因資料庫的運用與推導進而解讀出這些單倍體基因型的資訊. 本次研究希望透過使用兩組遺傳模型-哈代‧溫柏格平衡定律(Hardy-Weinberg equilibrium)與連鎖不平衡(linkage disequilibrium),來發展一套新的遺傳演算法(genetic algorithm). 研究所使用的兩組遺傳模型將分別使用最大似然估計法則(maximum likelihood estimates)與標準化共同資訊量(Normalized Mutual Information)進行統計與評估. 而這套遺傳演算法在處理NP困難問題(NP-Hard problem)中,產生出一個適當的多項式時間解決方法. 最終研究結果顯示,研究中所使用的遺傳演算法在只有使用哈代‧溫柏格平衡定律(Hardy-Weinberg equilibrium)時才能有較高的準確率.

並列關鍵字

無資料

參考文獻


O'Brien, S. J., and Nelson G. W. 2004. "Human genes that limit AIDS." Nature Genetics 36 (6): 565-574.
Wilke, R. A., Lin D. W., Roden, D. M., Watkins, P. B., Flockhart, D., Zineh, I., Giacomini, K. M., and Krauss, R. M. 2007. "Identifying genetic risk factors for serious adverse drug reactions: Current progress and challenges." Nature Reviews Drug Discovery 6 (11): 904-916.
Carlson, C. S., Eberle, M. A., Rieder, M. J., Smith, J. D., Kruglyak, L., and Nickerson, D. A. 2003. “Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans”. Nature Genetics 33: 518–521.
Drysdale, C. M., McGraw, D. W., Stack, C. B., Stephens, J. C., Judson, R. S., Nandabalan, K., Arnold, K., Ruano, G., and Liggett, S. B. 2000. "Complex promoter and coding region b2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness." Proceedings of the National Academy of Sciences 97 (19): 10483–10488.
Vljeg, A. V. H., Baglin, C. A., Bare, L. A., Rosendaal, F. R., and Baglin, T. P. (2008). “Proof of principle of potential clinical utility of multiple SNP analysis for prediction of recurrent venous thrombosis.” Journal of Thrombosis and Haemostasis, 6: 751–754.

延伸閱讀