透過您的圖書館登入
IP:3.238.228.191
  • 學位論文

龍膽石斑之全基因體定序組裝與功能註解分析

Whole Genome Assembly and Annotation of the Giant Grouper, Epinephelus lanceolatus in Next Generation Sequencing

指導教授 : 林仲彥

摘要


本研究以現今台灣重要具高經濟價值的養殖魚種龍膽石斑(Epinephelus lanceolatus)為研究對象,利用次世代定序技術,取得短序列散彈槍式基因體定序資料,以全新組裝(de novo Assembly)的方式,得到龍膽石斑的基因體支架 (scaffold),並由其中預測其可轉譯蛋白質序列,以作為基因體表現概況、調節機制,與免疫相關反應之功能性研究的重要基礎。 本研究比較數種不同組裝策略,其中以ALLPATHS-LG組裝軟體,加上SOAPdenovo組裝軟體裡的GapCloser程式所組裝的結果,其質與量均為最佳,可作為基因體組裝後續的分析基礎。組裝序列共有13,897條scaffolds,序列總和為1.09 Gbp。為了找出石斑魚基因體中所存在的基因,我們以AUGUSTUS演算模型,選用八目鰻 (Petromyzon marinus, 俗名:sea lamprey) 的基因模型做為參考進行預測,共可轉譯出42,433條蛋白質序列,其中92.6%的序列,能在nr蛋白質資料庫中找到對應序列(blastp, E-value:1E-5)。分析所對應序列的物種組成,有2/3以上皆屬脊索動物門輻鰭綱之物種,其餘1/3則為微生物序列,可能與樣本採集過程污染有關。將含有細菌的序列濾除後,保留9,473條scaffolds,序列長度總和為1.06 Gbp,與一般對石斑魚所估計的基因體大小 (1.1G bp) 接近;由此 9,473 條龍膽石斑的基因體支架得到的蛋白質序列為29,184條,有26,056條蛋白質(89.3 %)序列可在nr找到對應序列(blastp, E-value:1E-5),這些對應序列中有 93% 歸屬於輻鰭魚綱。與KEGG資料庫的比對後,有26,328筆對應結果,與 Pfam資料庫共有24,193條序列有蛋白質模組構造的註解,SignalP預測顯示有1,283條蛋白具有訊息胜肽,tmHMM分析顯示有5,302條預測具有穿膜蛋白構造。在比較基因體學的部分,本研究選用河魨(Fugu rubripes)、斑馬魚(Danio rerio)及吳郭魚(Oreochromis niloticus),將所預測出的轉譯蛋白序列進行互相比對,可發現任兩物種之間的可對應序列都接近 80% 或更高。 最後我們將組裝序列與註解結果資料整合,建構成線上資料庫,呈現石斑魚最完整的基因序列目錄與相關的調控代謝網路,將在日後整合轉錄體的相關資料,可於線上即時分析差異表現基因所涉及的代謝網路與找出相關參與基因群,以深入更廣泛地方式來瞭解複雜的調控機制。本研究期望能對龍膽石斑相關研究有所助益,並能對龍膽石斑基因體更深入的瞭解,進而對龍膽石斑成長、育種與病理等基礎研究有所助益。

並列摘要


Giant grouper (Epinephelus lanceolatus) is one of the most economically valuable aquaculture species in Taiwan. Its genome is estimated in 1.1G and has not been fully sequenced. In this study, we apply next-generation sequencing (NGS) technology to obtain whole genome shotgun sequences, then to do de novo assembly for recover the genome of E. lanceolatus from short reads. Then we annotate the genome of E. lanceolatus. Four different assembly strategies are evaluated for the best one to conclude genome scaffold. We chose ALLPATHS-LG as the assembler and used GapCloser to further fill the intra-scaffold gaps. Total 13,897 scaffolds, in total length sum of 1.09 G bp, were derived, from which 42,433 putative protein coding genes were predicted by AUGUSTUS using gene model of sea lamprey (Petromyzon marinus). Overall 92.6% of the protein products were annotated by nr (blastp, E-value: 1E-5). Among these best matched nr sequences, more than two-thirds protein coding sequences are from Phylum Actinopterygii, and the other one-third match to bacterial proteins. It indicate a possible source of sample contamination. We further remove scaffolds of bacterial origin and the final set is 9,473 scaffolds, sum up to 1.06 G bp; 29,184 protein sequences were derived and 89.3% of the best matched sequences were from Phylum Actinopterygii. There are 26,328 protein sequences mapped on KEGG database. We further annotated these protein sequences using Pfam, SignalP and tmHMM to reveal protein structure information. We use the whole set of protein coding genes derived from E. lanceolatus, Fugu rubripes, Danio rerio and Oreochromis niloticus and find the best hits for each proteins mutually by blast. About 80% of the protein sequences can find match between any two species. Finally, we integrated the protein coding sequences with their annotations into a web database. This database will be open and become a helpful grouper genome resource for research community. It will benefit the grouper researchers for studying physiology and pathology, as well as for defining genetic traits such as fast growing rate and disease resistance for breeding.

參考文獻


李昆霖 (2013) 以次世代定序資料重組南美白蝦的轉錄基因體並探討其基因表現。國立臺灣大學漁業科學研究所碩士論文。
林宜靜 (2012) 以表現標誌序列重組南美白蝦之轉錄基因體並比較不同組織間的基因表現。國立臺灣大學漁業科學研究所碩士論文。
楊玉婷、陳葦芋、陳政忻 (2009) 石斑魚產業概況及趨勢。農業生技產業季刊,19: 26-29。
Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P., Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q., Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G., Thomas, P. D., Zhang, J., Gabor Miklos, G. L., Nelson, C., Broder, S., Clark, A. G., Nadeau, J., McKusick, V. A., Zinder, N., et al. (2001) The sequence of the human genome. Science, 291(5507): 1304-1351.
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y. J., Chen, Z., Dewell, S. B., Du, L., Fierro, J. M., Gomes, X. V., Godwin, B. C., He, W., Helgesen, S., Ho, C. H., Irzyk, G. P., Jando, S. C., Alenquer, M. L., Jarvie, T. P., Jirage, K. B., Kim, J. B., Knight, J. R., Lanza, J. R., Leamon, J. H., Lefkowitz, S. M., Lei, M., Li, J., et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437(7057): 376-380.

延伸閱讀