透過您的圖書館登入
IP:18.222.111.24
  • 學位論文

以新研發之次世代定序序列組合策略探討乙型肝炎病毒之演化與選汰

Using novel next generation sequencing assembly approaches to reveal new insights into the evolutionary rate and selection of hepatitis B virus

指導教授 : 陳俊宏 王弘毅

摘要


隨著次世代定序逐漸成為主要獲得大量基因體資訊的主流方式,與之相對應的分析軟體其功能與研發也日益受重視。序列組合是分析次世代定序資料首重的步驟,尤其是針對由Illumina平台所生成的定序結果而言。Illumina平台是目前主要的次世代定序平台之一,其資料特徵是資料量龐大但單一序列長度較短,而較短的序列會造成序列組合上的難度增加。 HIV、HBV及HCV等致病性病毒在單一宿主體內同時擁有著大量的個體數與複雜的歧異度,適合利用Illumina之特性充分探討大量且複雜的病毒族群,惟需解決短序列所造成的序列重組困難。目前較常見的組合演算法當初皆非以此類大量且複雜之資料為研發對象,故皆不能有效組合病毒族群之次世代定序資料。有鑑於此,我們此篇研究提出BBAP次世代序列組合演算法以有效分析病毒族群之序列資料。同時,我們也提出一新的序列組合策略,先以de novo組合的方式組合少部分的資料序列,再以組合的結果為參考序列將全部的資料序列組合完全。 過去許多探討HBV演化的研究皆有估測其演化速率,但不同時間維度下所估算之演化速率有相當的差距。此估算的差距可能源自於HBV之特殊病理階段。在致病初期宿主對於病毒族群僅有些微的免疫反應,所以此時病毒之演化重點著重於和其他病毒之間的競爭成長(開拓者)。在中後期階段後宿主免疫系統開始針對HBV進行清除動作,病毒之演化策略則調整成躲避與適應宿主免疫系統(適應者)。由於HBV可以藉由母子垂直傳染,HBV病毒群會不斷的在此兩種迥異的病理階段間輪替。我們認為各研究估算HBV演化速率不同之原因有絕大部分是這輪替現象所造成的。HBV的基因體不但小而且有著複雜的基因結構,以致於單一病毒株很難同時滿足兩種階段不同的策略方針,所以產生了開拓者─適應者權衡交換機制。為此,我們從一三代皆垂直感染HBV的家族採集並定序12個HBV樣本。我們同時採用了次世代定序以及傳統定序,並利用BBAP分析前者所產生之序列資料,且估算短時間內與長時間內的HBV演化速率以及驗證我們所提出的開拓者─適應者權衡交換機制。 不斷的在不同階段間輪替需要快速且大量的序列變異才有可能迅速適應各階段,而HBV高錯誤率的聚合酶及高複製速率則提供了所需的序列的變異。了解這些變異受何種機制調控將會有助於我們了解HBV的演化。由於BBAP可以以10-4的高解析度分析序列變異,我們藉由比較實際測得的變異數目與不同模型與假設下估算的變異數目以探討HBV病毒族群之演化過程。 研究結果顯示BBAP相較於其他組合演算法可以提供較為有效率及準確之組合結果,並且藉由BBAP組合結果所獲得的分析資料顯示我們所提之開拓者─適應者權衡交換機制可以解釋過去研究中HBV演化速率估算不同之現象,且進一步確認正向選擇與負向選擇皆在HBV演化過程中扮演相當重要之角色。

並列摘要


Next-generation sequencing has become the mainstream method of obtaining high quantities of genomic data during the past decade. The increased accessibility of massive data sets has driven up the need for compatible analytic algorithms and software, whereas assembly is the initial and foremost important step when analyzing these data sets. Sequence assembly is especially critical for analyzing data sets generated by the Illumina platform, one of the two most commonly used next-generation sequencing platform along with the 454/Roche pyrosequencing platform. Illumina sequencing produces much larger data sets compared to pyrosequencing, but its shorter read length presents difficulties for de novo assembly. Pathogenic viruses, such as HIV, HBV, and HCV, can be both abundant and greatly diversified within a single host carrier. The high output capability of the Illumina platform is well suited for the detection of genetic variations within viral quasispecies, but its short read length impedes assembly efficiency while the high genetic variation itself also presents challenges for assembly algorithms. Most available de novo assembly algorithms, such as Velvet, were not originally intended for such metagenomic data sets and cannot efficiently assemble viral quasispecies NGS data sets. Therefore, we present a BLAST-based assembly pipeline, BBAP, developed for the assembly of metagenomic data sets. We also propose a hybrid de novo-reference assembly strategy which initially de novo assembles a partial data set, and the resulting scaffolds are then used as reference sequences to assemble the full data set through reference assembly. Previous studies have tried to understand the evolution of HBV, but estimations of long and short term mutation rates have been inconsistent. One possible reason for the observed ambiguity among HBV evolutionary rates is the distinct viral dynamics of HBV. There is limited host immune response during the early immunotolerance phase, and the evolutionary focus of viral quasispecies is on competition between viral variants (colonizers), whereas the host immune response increases during the following immunoclearance phase and shifts selection pressure back to evasion and adaptation of the host immune system (adaptors). Considering that HBV can be vertically transmitted from mother to infant, the HBV quasispecies constantly shifts between immunotolereance and immoclearance phases. We propose that the inconsistency among observed HBV evolutionary rates is due to the constant shifting between colonizer and adaptor roles resulting from rapid changing selection regimes. Furthermore, the relative small HBV genome size and complex genomic structure makes it extreme unlikely for a viral strain to reach dominant states in both phases, therefore a colonization-adaptation trade off (CAT) occurs. We sequenced 12 HBV samples from a three generation family, whom all have vertically transmitted chronic HBV, using next generation sequencing and Sanger sequencing. NGS data sets were assembled by BBAP, and evolutionary rates of both short- and long- term were both estimated while testing the CAT model. Frequent shifting between phases requires the virus to constantly adjust accordingly, which is provided by the error prone polymerase and high replication rate. During the course of chronic hepatitis B virus infection a huge amount of genetic variation is accumulated within the host. Revealing the processes that govern this diversification is also important to our understanding of the HBV evolution. Because BBAP is capable of detecting polymorphisms with frequencies as rare as 10-4, we compared the number of mutations with the expected number of mutations derived from simulations of different assumptions to examine the HBV quasispecies evolutionary dynamics. Overall, BBAP provides efficient and accurate assembly results compared to other assemblers, and results from the BBAP assembly of the study family NGS data sets suggests the CAT model to be able to account for the discrepancies observed among short- and long- term evolutionary rates with both positive and negative selection playing important roles shaping the HBV genome.

參考文獻


1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860-921.
2. Guarnaccia M, Gentile G, Alessi E, Schneider C, Petralia S, Cavallaro S. Is this the real time for genomics? Genomics 2014;103:177-182.
3. Luscombe NM, Greenbaum D, Gerstein M. What is bioinformatics? A proposed definition and overview of the field. Methods Inf Med 2001;40:346-358.
4. Solomon KV, Haitjema CH, Thompson DA, O'Malley MA. Extracting data from the muck: deriving biological insight from complex microbial communities and non-model organisms with next generation sequencing. Curr Opin Biotechnol 2014;28:103-110.
5. Di Bella JM, Bao Y, Gloor GB, Burton JP, Reid G. High throughput sequencing methods and analysis for microbiome research. J Microbiol Methods 2013;95:401-414.

延伸閱讀