多表現序列標籤到基因體排列演算法之設計與實作

Sequence Alignment是研究基因體功能的一個重要課題。由於傳統的Alignment方法都僅考慮單一EST對Genome Alignment為最佳解，我們開發了一個新的Multiple Spliced Alignment方法，透過將所有的表現序列標籤資料對基因體做Alignment後，再加上Spliced Alignment的功能以正確判斷Exon與Intron的序列位置。與傳統方法不同的是，本方法產生Exon的方式是透過參考其他EST的Alignment資料作為參考，因此沒有一般Pair-wise Alignment在外顯子(Exon)過短時的不正確定位以及判斷錯誤的問題，同時提升了整體可用資料的數量。以此種方式所得的Alignment資料正確率較高，因此也減低了對於如選擇性裁剪，基因結構等等資料分析時出現的假陽性結果。

關鍵字

基因體；外顯子；演算法；表現序列標籤；多序列比對；選擇性裁剪

並列摘要

Expressed Sequence Tag (EST) is the key that we know the function of the organism. Through the EST analysis, we’ll understand the information of the alternative splicing, SNP, gene structure, etc. The multiple sequence alignment (MSA) problem is to align similar subsequences of ESTs in the same region. This problem have been studied extensively. Once the genome sequence is already sequenced, it’s important to align EST to genome. Because EST is the segment of the mRNA, and when DNA transcript to mRNA it will lose the Intron, so before we align the EST to genome, we have to check the sliced of exons. This kind of alignment is known as the “Spliced alignment”. There are some well known spliced alignment algorithm, like Sim4[2], Mugup[1], etc. Suppose we have many ESTs at hand, traditional spliced alignment algorithms do not consider the alignment score between ESTs. In this study, we consider a special case of multiple spliced alignment. We call it – the multiple spliced alignment problem. Given a set of EST {E1,E2,…En} and the genomic sequence G, the multiple spliced alignment problem is to fine a multiple alignment M so that M also follows the rule of spliced alignment. We propose a heuristic algorithm for the multiple spliced alignment problem. First, we find spliced alignments for every EST. For those exons with lower scores, we adjust their alignments according to alignments of exons in other ESTs. By using our EST multiple spliced alignment algorithm (EMSA), we aligned the entire EST set (dbEST release 040601) to the human genome (NCBI Build 34) and found 6116 ESTs alignment were better then using only traditional spliced alignment programs. This program is written in Perl, and it can run on normal personal computer

並列關鍵字

exon ； multiple sequence alignment ； spliced alignment ； exon-intrin structure ； pair-wise alignment

參考文獻

[1] Exon discovery by genomic sequence alignment, Morgenstern et al., Bioinformatics, Vol. 18, 2002, p777-787

[3] A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence, Florea et al., Genome Research, 8:967-974, 1998

[4] Computational Discovery of Internal Micro- Exons, Volfvsky et al., Genome Res., 13: 1216- 1221, 2003

[5] Frequent Alternative Splicing of Human Genes, Mironov et al., Genome Res., 9: 1288-1293, 1999

[6] Performance-Guarantee Gene Predictions via Spliced Alignment, Mironov et al., Genomics, 51, 332- 339, 1998

國際替代計量

多表現序列標籤到基因體排列演算法之設計與實作

未授權

主題瀏覽