運用遺傳演算法尋找Motifs

在後基因體時代，由於人類基因定序已達相當高的比例，龐大的定序資料因而產生，利用資訊科技從這些資料挖掘出對人類基因體研究有用的訊息就顯得非常重要。本篇論文研究的目的在於如何在基因轉錄起始位置(transcription start site)的上游區(upstream)找出motifs，我們的目標序列是基因轉錄起始位置往前2000個至往後1000個鹼基的範圍。在本篇論文中，我們提出一個新的方法來預測motif，核心的運算是使用遺傳演算法(genetic algorithm)，演算法裡的mutation採用權值矩陣(weight matrix)來保留好的核酸，crossover則使用我們所設計的gap penalties來選擇最佳的patterns，最後運用Gibbs Sampler中的權值矩陣來重置逐漸穩定的patterns，使其能重新預測出最合適的motifs。同時，我們運用分散式平行處理的架構來增強我們方法的運算效率。最後，我們使用模擬資料與實際資料測試我們的方法，並與Multiple Em for Motif Elicitation (MEME)及Gibbs Sampler 這兩種目前較普遍被使用的尋找motif方法比較這三種方法的預測正確度及效率。

關鍵字

生物資訊；一致性序列；遺傳演算法； RNA聚核酶II啟動區

並列摘要

In the era of post-genomics, almost all the genes have been sequenced and enormous amounts of data have been generated. Hence, to mine useful information from these data is a very important topic. In this thesis we propose a new approach for finding potential motifs in the regions located from the -2000 bp upstream to +1000 bp downstream of transcription start site (TSS). This new approach is developed based on the genetic algorithm (GA). The mutation in the GA is performed by using position weight matrices to reserve the completely conserved positions. The crossover is implemented with gap penalties to produce the optimal child pattern. We also present a rearrangement method based on position weight matrices to avoid the presence of a very stable local minimum which may make it quite difficult for the other operators to generate the optimal pattern. This new approach shows superior performance by comparing with Multiple Em for Motif Elicitation (MEME) and Gibbs Sampler, which are two popular algorithms for finding motifs.

並列關鍵字

Motifs ； Consensus sequence ； Genetic algorithm ； RNA polymerase II promoter ； Bioinformatics

參考文獻

[1] G. D. Stormo, “DNA binding sites: representation and discovery,” Bioinformatics, 16, pp. 16-23, 2000.

[2] Stephen T. Smale and James T. Kadonaga, “The RNA Polymerase II Core Promoter,” Annu. Rev. Biochem, 72, pp. 449-479, 2003.

[4] Ptashne M., “How eukaryotic transcriptional activators work,” Nature, Vol. 335, 1988, pp. 683-689.

[6] Triezenberg, S, “Structure and function of transcriptional activation domains,” Curr Opin Genet Dev, 5, 1995, pp. 190-196.

[7] Sauer, F & Tjian, R., “Mechanisms of transcription activation：differences and similarities between yeast, Drosophila, and man,” Curr Opin Genet Dev, 7, 1997, pp. 176-181.

被引用紀錄

王怡菁（2004）。使用單一核甘酸多型性辨識癌症相關基因之研究〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916284889

李美宜（2005）。使用轉錄因子結合區辨識癌症相關基因之研究〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916284781

國際替代計量

運用遺傳演算法尋找Motifs

未授權

主題瀏覽