選擇性剪裁資料庫自動更新與監控程式之設計與實作

選擇性剪裁是真核細胞中基因表現重要的調控機制。建立選擇性剪裁資料庫需要大量的計算及複雜的工作。我們知道兩個主要的資料來源會有新增及修正，有表現序列標籤序列(ESTs)及基因體(Genome)序列。dbEST平均每個月有十萬筆有表現序列標籤序列,這些包含各物種(人、鼠、等等);而基因體約半年會有一次更新版本產生，如果基因體(genome)更新，需要將大量ESTs 比對到基因體(genome) ，而且要分群，並分析選擇性剪裁的型式(Alternative splicing form)為何?執行這些工作相當耗時，以10台個人電腦來執行這個工作，6百萬筆約要一個月的時間才能完成。而且需要人力的維護及監控，來保持選擇性剪裁資料庫(Alternative splicing database)的即時性。為了解決這個問題，我們需要具有自我更新能力的選擇性剪裁資料庫(並且能提供使用者個人化的即時訊息) ，換言之，我們只要在周期內，檢查資料有無新增或異動，然後再作後續處理，更新資料庫。因此，我們開發了自動更新程式協助執行這些複雜的工作而且能夠提供使用者在新版選擇性剪裁資料庫有那些變動,差異，相信這些訊息，是令人感興趣的部份。人類基因體序列已接近完成，所以有九成以上的序列是沒有變動。如果，這些沒有變動的序列上的有表現序列標籤都延用上次的結果，這樣可以節省比對的時間。我們利用基因體序列相關的位置資訊來縮短表現序列標籤比對到基因體的時間。在不同的基因版本只有些許的差異，大部份有表現序列標籤只需要延用上一次的比對結果,換言之，在新版的基因體中，有表現序列標籤比對結果都沒有改變。傳統作法是將全部的有表現序列標籤重新比對新版基因體。利用基因體位置資料，我們節省了九成的比對時間。

關鍵字

選擇性剪裁；資料庫；表現序列標籤；基因體

並列摘要

Alternative splicing (AS) is an important mechanism of gene expression in eukaryotic cells. It needs a huge amount of calculations and complex procedures to construct AS database. We know that two main materials will be added and revised and they are expressed as sequence tag sequence (ESTs) and genome. The dbEST has an increase in hundred thousand ESTs every month on average, including different species (humans, and mice etc.). The genome upgrades the edition and emerges in approximately once half a year. If genome upgrades, it needs a large amount of EST aligned to genome and ESTs should be grouped by exon. Then, we parser EST’s exon belongs to what form of alternative splicing. It is quite time-consuming to carry out these procedures. Six million ESTs take a month to finish this job by ten personal computers. Maintenance and surveillance of the updated accuracy of the AS database requires manpower. In order to solve this problem, the AS database possesses renewed capability (offering individualized instant information to users). In other words, we check unusual fluctuations only within a cycle and update the AS database. So, we have developed updated procedures to help carry out this complicated work. At the same time, it offers those differences in the new alternative splicing database to users, which may be an interesting part for readers. Homo sapiens genome sequences are almost finished. Homo sapiens genome sequences do not modify over 90% of them. If ESTs on these sequences without modifies prolong and use the result of last time, alignment time will be saved. We have explained the approach of using the genome sequences location to reduce time of aligning expressed sequence tags (ESTs) with genome. There are only some differences in different genomic editions. Most ESTs only need to use the last result of all ESTs aligning genome. In other words, we can get the same position of ESTs aligning the genome position without change. If genome is updated, all ESTs align new genome again in the traditional way. We have saved 90% of the time by utilizing genome position.

並列關鍵字

alternative splicing ； database ； EST ； genome

參考文獻

[1] R.E Breitbart, A. Andreadis and B. Nadal-Ginard, Alternative splicing: a ubiquitous mechanism for generation of multiple protein isoforms from single genes, Annu. Rev. Biochem. 1987, vol. 56, pp. 467-495.

[2] P.J. Grabowski, and D.L. Black, Alternative RNA splicing in the nervous system, Progress Neurobiol. 2001, vol. 65, pp. 289-308.

[3] B R. Graveley, Alternative splicing: increasing diversity in the proteomic world, Trends Genet. 2001, vol. 17, pp. 100-107.

[4] A.A. Mironov, J.W. Fickett and M.S. Gelfand, Frequent alternative splicing of human genes, Genome Res. 1999, vol. 9, pp. 1288-1293.

[5] B. Modrek, A. Resch, C. Grasso and C. Lee, Genome-wide analysis of alternative splicing using human expressed sequence data, Nucleic Acids Res. 2001, vol. 29, pp. 2850-2859.

國際替代計量

選擇性剪裁資料庫自動更新與監控程式之設計與實作

未授權

主題瀏覽