透過您的圖書館登入
IP:3.14.6.194
  • 學位論文

建立基因體研究之多元分析系統及其應用

Development of Comprehensive Analysis Systems for Genomic Research and Applications

指導教授 : 莊曜宇
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


由於定序技術的成本降低,每天都產生了大量的資料,使得基因體的研究進入了一個新的時代。然而如何有效、正確和方便地處理和分析這些高通量的資料仍是一項挑戰,有鑑於此,本論文目標在利用生物資訊學的方法,解決高通量實驗技術在基因體研究上所遇到的問題。具體而言,本研究包含二大方向:發展全面性的線上分析系統,以及應用次世代定序的分析流程(pipeline),並分成四個主題進行探討。 在第一個主題中,為了解決在多個預測演算法中出現不一致的預測結果和微型核糖核酸(miRNA)頻繁改變的命名方式,本論文發展了一個線上工具-miRSystem,它同時具備預測miRNA目標基因和進行生物功能與調控途徑分析的能力。本程式從整合的7個預測演算法和2個經過實驗驗證的資料中,來找出有相同結果的miRNA目標基因,接著利用這些基因並透過2種不同的富集(enrichment)演算法進一步尋找出下游的生物功能和調控途徑。此外,利用附屬在本系統下的程式miRConverter可以消除miRNA在命名上的潛在差異,並可以從給定的序列中搜尋出相似的miRNA。 在第二個主題中,研究旨在探討次世代定序的各種技術及廣泛應用,並從中討論、提出資料處理的分析流程和給予在基因體學、轉錄體學與miRNA研究中合適工具的選擇建議。 在第三個主題中,研究著重在細胞株和臨床樣本的內生性基因表現。「如何選擇一個合適的細胞株作為實驗模型」、「如何在橫跨數個龐大的資料集當中,進行全面性的分析與有效的視覺化其結果」,這些問題往往是各生物實驗室的主要挑戰。有鑑於此,本研究開發了CellExpress線上系統,具有基因表現查詢、相似性評估、基因標誌探索和使用者資料分析等四大功能,將有助於使用者從指定的細胞株或是臨床樣本中,進行有正規化過的基因表現量查詢、比較樣本之間的差異性或相似性、和找到有顯著改變的基因,此外,可針對使用者所上傳的資料,去跟現行系統的資料庫進行比較,並尋找出相似的基因表現圖譜(profiling)。 在第四個主題中,本論文建立了一條全新基因體組裝的分析流程,並且完成了第一個帝雉的全基因體定序。帝雉的基因體草圖包含約10億個鹼基對和15,972個註解基因,並顯示其能量代謝、氧氣運輸、血紅素結合、輻射反應、免疫反應和DNA修復等功能的基因,受到在演化適應上的正向選擇與基因數量擴張的影響。此外,在一條22.7萬個鹼基對的主要組織相容性複合體(MHC)序列的連續區域中,利用手動整理註解出39個可能的基因,比較原雞相同的基因座發現具有高程度的相似性,但也發現了2個基因翻轉點在TAPBP和TAP1-TAP2基因之間。接著對帝雉粒線體DNA進行定序和組裝,並且跟其他4種長尾雉做比較,從分子時鐘的分析推測帝雉的祖先大約是在347萬年前從北方遷徙到台灣。 整體而言,本論文提出了在基因體學上的應用及方法。結果表明所提出的工具不但都可以有效地解決基因體上的問題,而且所提出的分析流程亦成功地呈現了帝雉基因體對適應高海拔的見解。

並列摘要


Owing to the reduced cost of sequencing technology, a massive amount of data has been generated every day and made genomic research enter a new era. However, it is a challenge to process and analyze these kinds of high-throughput data effectively, correctly, and conveniently. The dissertation aimed to address the issue of high-throughput experimental techniques in genomic research using bioinformatics approaches. Specifically, development of comprehensive and web-based analysis systems and application of NGS pipelines were the two main approaches which were further divided into four topics of investigations. In the first topic, miRSystem was developed to overcome the issues of inconsistently predicted results across multiple prediction algorithms and the nomenclature of microRNAs (miRNAs) with a frequent change. MiRSystem was a web-based system to support miRNA target prediction and analysis of biological functions and pathways simultaneously. Seven prediction algorithms and two experimentally validated data sources were integrated into the program to reveal genes consistently targeted by miRNAs. These identified target genes were further investigated downstream biological functions and pathways by two enrichment algorithms. In addition, miRConverter, an accessory of the system, was proposed to remove the potential discrepancies in the nomenclature of miRNAs and to search similar miRNAs from a given sequence. To investigate broad applications of NGS, in the second topic, the author discussed the available implementations of NGS technologies, presented guidelines for data processing pipelines, and made suggestions for selecting suitable tools in genomics, transcriptomics, and small RNA research. In the third topic, the study focused on endogenous gene expression in cell lines and clinical samples. In a biological laboratory, it still poses a major challenge in how to select an appropriate cell line to serve as an experimental model and how to perform comprehensive analysis and efficient visualization results across several large datasets at once. To address these issues, an online system, CellExpress, was proposed with four functions, including gene expression search, similarity assessment, gene signature explorer, and user data analysis. These functions were a great benefit to query normalized gene expression values, to compare the difference or similarity, to identify significantly changed genes from specified cell lines and clinical samples, and to compare gene expression profiling with user uploaded data and existing datasets in the system, respectively. In the fourth topic, the study established an NGS analysis pipeline for de novo genome assembly and completed the first whole-genome sequencing of the Mikado pheasant. The draft genome of the Mikado pheasant, which consisted of 1.04 Gb sequences and 15,972 annotated protein-coding genes, and displayed expansion and positive selection of genes related to features that contributed to its adaptive evolution, such as energy metabolism, oxygen transport, hemoglobin binding, radiation response, immune response, and DNA repair. Furthermore, the major histocompatibility complex (MHC) region which contained 39 putative genes spanning 227 kb on a contiguous region were annotated and manually curated. The MHC loci of the pheasant revealed a high level of synteny and two inversions of TAPBP and TAP1-TAP2 genes compared with the same loci in the chicken. The complete mitochondrial genome was also sequenced, assembled, and compared against four other long-tailed pheasants. The result from molecular clock analysis suggested that ancestors of the Mikado pheasant might migrate from the north to Taiwan about 3.47 million years ago. In conclusion, this dissertation presented a comprehensive approach for genomic research and applications. The results demonstrated that the proposed tools all effectively addressed genomic issues as well as proposed pipelines successfully revealed insights into the adaptation to high altitude from the Mikado pheasant genome.

參考文獻


1. Schena, M., et al., Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 1995. 270(5235): p. 467-70.
2. Bumgarner, R., Overview of DNA microarrays: types, applications, and their future. Curr Protoc Mol Biol, 2013. Chapter 22: p. Unit 22 1.
3. Shendure, J., et al., Accurate multiplex polony sequencing of an evolved bacterial genome. Science, 2005. 309(5741): p. 1728-32.
4. Metzker, M.L., Sequencing technologies - the next generation. Nat Rev Genet, 2010. 11(1): p. 31-46.
5. Dewey, F.E., et al., DNA sequencing: clinical applications of new DNA sequencing technologies. Circulation, 2012. 125(7): p. 931-44.

延伸閱讀