透過您的圖書館登入
IP:3.144.15.43
  • 學位論文

多表現序列標籤到基因體排列演算法之設計與實作

The Design and Implement of Multiple EST to Genome Alignment

指導教授 : 許芳榮 楊偉儒
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


Sequence Alignment是研究基因體功能的一個重要課題。由於傳統的Alignment方法都僅考慮單一EST對Genome Alignment為最佳解,我們開發了一個新的Multiple Spliced Alignment方法,透過將所有的表現序列標籤資料對基因體做Alignment後,再加上Spliced Alignment的功能以正確判斷Exon與Intron的序列位置。與傳統方法不同的是,本方法產生Exon的方式是透過參考其他EST的Alignment資料作為參考,因此沒有一般Pair-wise Alignment在外顯子(Exon)過短時的不正確定位以及判斷錯誤的問題,同時提升了整體可用資料的數量。以此種方式所得的Alignment資料正確率較高,因此也減低了對於如選擇性裁剪,基因結構等等資料分析時出現的假陽性結果。

並列摘要


Expressed Sequence Tag (EST) is the key that we know the function of the organism. Through the EST analysis, we’ll understand the information of the alternative splicing, SNP, gene structure, etc. The multiple sequence alignment (MSA) problem is to align similar subsequences of ESTs in the same region. This problem have been studied extensively. Once the genome sequence is already sequenced, it’s important to align EST to genome. Because EST is the segment of the mRNA, and when DNA transcript to mRNA it will lose the Intron, so before we align the EST to genome, we have to check the sliced of exons. This kind of alignment is known as the “Spliced alignment”. There are some well known spliced alignment algorithm, like Sim4[2], Mugup[1], etc. Suppose we have many ESTs at hand, traditional spliced alignment algorithms do not consider the alignment score between ESTs. In this study, we consider a special case of multiple spliced alignment. We call it – the multiple spliced alignment problem. Given a set of EST {E1,E2,…En} and the genomic sequence G, the multiple spliced alignment problem is to fine a multiple alignment M so that M also follows the rule of spliced alignment. We propose a heuristic algorithm for the multiple spliced alignment problem. First, we find spliced alignments for every EST. For those exons with lower scores, we adjust their alignments according to alignments of exons in other ESTs. By using our EST multiple spliced alignment algorithm (EMSA), we aligned the entire EST set (dbEST release 040601) to the human genome (NCBI Build 34) and found 6116 ESTs alignment were better then using only traditional spliced alignment programs. This program is written in Perl, and it can run on normal personal computer

參考文獻


[1] Exon discovery by genomic sequence alignment, Morgenstern et al., Bioinformatics, Vol. 18, 2002, p777-787
[3] A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence, Florea et al., Genome Research, 8:967-974, 1998
[4] Computational Discovery of Internal Micro- Exons, Volfvsky et al., Genome Res., 13: 1216- 1221, 2003
[5] Frequent Alternative Splicing of Human Genes, Mironov et al., Genome Res., 9: 1288-1293, 1999
[6] Performance-Guarantee Gene Predictions via Spliced Alignment, Mironov et al., Genomics, 51, 332- 339, 1998

延伸閱讀