全基因體高密度多層單標籤索引之建立及其應用

傳統序列比對的方法通常分兩種，動態規劃法(Dynamic Programming)和前置作業建立表格的查詢方法。第一種方法，動態規劃法可分為區域和全域規劃法，就基因體而言是非常長的序列，而比對序列往往相對於基因體為短序列，使用動態規劃往往需要大量時間。第二種方法，建立表格的查詢方式，往往可以節省時間，可是卻需要額外的空間來建立表格。表格的查詢方式的原理大致可以分為兩步驟，第一步驟為利用表格查詢找出適當的比對答案範圍，縮小比對範圍，第二步驟為比對序列，利用第一步驟找出的縮小範圍，再將序列利用動態規劃的方式比對出答案。黃明經博士所提出的單標籤法為建立表格查詢方式，利用單標籤索引只出現一次的概念來找出查詢序列所在的範圍，而後由許芳榮與陳哲峰提出利用多層單標籤結構來降低查詢表格時間。而單標籤索引方法中，單標籤索引會造成索引有群聚現象，而群聚現象限會使得若要查詢比對序列位於此空白區段，則會使其序列無法定位。而我們此篇論文也是就此群聚現象提出解決方法，維護利用單標籤只出現過一次的基本概念，利用原有的14標籤資訊，找出兩個14標籤且中間距離為n使得此圖形為唯一，稱為間隔單標籤。使用間隔單標籤來降低空白區段，提升索引覆蓋密度。依據實驗結果，我們利用間隔單標籤方法，使得群聚現象大幅度降低，索引覆蓋密度提昇。並且我們使用第21條人類基因體去建立索引，且使用第21條的SNP序列資料去做比對實驗，發現本來使用單標籤無法比對定位的SNP序列，因此能夠藉由間隔單標籤索引表格而比對定位。

關鍵字

全基因體；單標籤；多層單標籤索引；單一核苷酸多型性

並列摘要

Sequence alignment usually has two kinds of methods. The first kind of method is the dynamic programming. The second way is to set up index table. Dynamic programming is divided into a local method and global method. The principle of the index way can roughly be divided into two steps. The first step is to use index table to look for the proper range. The second step is to align sequence. The Unique Markers method is the second kind of method. The UM method will product the clustering. This phenomenon will cause the fault. If the query sequence belongs to this blank region, it can not be success for alignment. We propose a method solve the clustering problem. We use the Interval UM method to set up the second index table. This method keeps the concept of the original method. In our experiment, we show our method can improve the density of cover of the index.

並列關鍵字

Unique Marker ； SNP

參考文獻

[1] Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. (1990) Basic local alignment search tool. J. Mol. Biol. 215: 403–410.

[2] Chao, K. M., Zhang, J., Ostell, J. and Miller, W. (1995) A local alignment tool for very long DNA sequences. Comput. Appl. Biosci. 11: 147–153.

[3] Sze, S. H. and Pevzner, P. A. (1997) Las Vegas algorithms for gene recognition: Suboptimal and error-tolerant spliced alignment. J. Comput. Biol. 4: 297–309.

[5] Chao, K. M., Zhang, J., Ostell, J. and Miller, W. (1997) A tool for aligning very similar DNA sequences. Comput. Appl. Biosci. 13: 75–80.

[7] Burge, C. B. and Karlin, S. (1998) Finding the genes in genomic DNA. Curr. Opin. Struct. Biol. 8: 346–354.

國際替代計量

全基因體高密度多層單標籤索引之建立及其應用

全文下載

主題瀏覽