透過您的圖書館登入
IP:18.191.211.66
  • 學位論文

評估應用小片斷鹼基序列於綜合基因體序列分析上的可行性

Evaluation of K-mer in the analysis of metagenomic sequences

指導教授 : 王世融

摘要


綜合基因體是目前廣為用來研究細菌生態的重要方法,傳統的細菌菌相分析工具經常是利用少數保守基因於綜合基因體的分佈來決定細菌存在的豐富度,然而在演化關係相近的細菌或細菌品系上,這些保守基因的差異度常常是非常的微小甚至是毫無差異的,僅利用少數保守基因做為分辨細菌的依據,並無法精細地了解演化關係非常近似的菌群之分佈關係。 近年來快速進步的次世代序列分析技術使得綜合基因體學的研究有了更可靠的研究工具,新型定序法雖然可以產生更為精確的序列資料,然而次世代定序法所產生的龐雜數據經常是一般實驗室難以處理,如何處理這些數據成為綜合基因體學研究的新課題。目前有許多不同的分析工具用來分析次世代定序法產生的綜合基因體序列,依其是否有比對已註解基因體的流程,將其分為引導和無引導兩類型。引導式的工具通常會比無引導式的工具產生更精確的預測結果,但其缺點是速度慢且需要龐大的運算能力,而對於尚未解碼細菌的預測能力也不佳,無引導式的方法雖然簡易快速但預測準確度較低。 本研究利用公開程式、資料庫和自編程序從每隻細菌中挑選有特異性的鹼基序列作為具辨識性的特殊序列,將這些特殊序列與未知菌群比對,可以找出在未知菌群中菌相的分佈情形,以往常用的分析方法需極大的資料庫和運算能力,因此硬體設備的需求高,本研究的分析法可以有效地減低資料量和提升運算能力,能在一般家用電腦上運行,並且可以自行調整選取條件。

並列摘要


Metagenome studies have been widely employed to understand complicated bacterial ecology. Traditional methods for bacterial community structure (metagenome) studies are mostly based on the variations from a few conserved genes. However, the variations of these conserved genes among evolutionary close species or strains are generally indistinguishable or even none. Therefore, it is almost impossible to draw any phylogenetic conclusion for very close species using conserved genes approach. The recent advance of Next Generation sequencing (NGS) creates an unprecedented opportunity for metagenomics research. NGS though can generate millions of sequencing reads, analyzing the enormous data is beyond the capabilities of ordinary labs. Existing tools are frequently being categorized as supervised or unsupervised depending on whether or not the sequence comparison procedure being used. Supervised tools tend to generate more accurate prediction than the unsupervised counter-parts, however, they require extensive computing power and complicated statistical models. They also perform poorly in analyzing unknown genomes. Unsupervised methods, though simple, can only perform approximate binning and thus ignore those low abundance bacteria. This project used publicly available tools and self-written perl scripts to design a new algorithm for analyzing metagenomes. By using phylogenetically unique K-mer markers, we aim to identify the bacteria and their abundance levels in the metagenome. Our platform is very rapid and light-weighted that can be installed in a desktop PC.

並列關鍵字

Metagenome unique K-mer linear programming

參考文獻


Ames, S.K. et al. (2013) Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics. 29(18): 2253–2260.
Brady, A. and Salzberg S.L. (2009) Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods,6,673–676.
Chang, Y. and Lin, K.(2012) A phylogenomic analysis of Escherichia coli/Shigella group: implications of genomic features associated with pathogenicity and ecological adaptation. BMC Evol Biol.12: 174.
Chatterjee, S. et al.(2014) SEK: sparsity exploiting k-mer-based estimation of bacterial community composition. Bioinformatics, 30(17):2423-31.
Cole, J. et al. (2005) The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res., 33,D294.

延伸閱讀