透過您的圖書館登入
IP:18.213.110.162
  • 學位論文

建立適用於生醫大數據分析及分類的生物資訊學工具及資料庫

Developing bioinformatics tools and databases for biomedical big data analysis and classification

指導教授 : 莊曜宇

摘要


近十年由於次世代定序技術的快速進步,使得總體基因體學的相關研究迅速發展,其中有大量文獻探討如何藉由分析原核生物的16S核醣體RNA來獲得一個環境中的菌相組成。因此,許多用來分析次世代定序技術所產出的細菌16S短序列資料的工具軟體與資料庫也應運而生。然而,這些工具大多都是開發在Linux作業系統,以指令集介面為主,對於缺乏資訊背景的研究人員來說,使用時將面臨許多困難,因此如何將這些指令集介面的軟體改良成具有親和力的線上圖形化介面系統,以降低分析工作的複雜度,是此份研究的主題之一。 近年來,越來越多的研究開始利用三代定序技術來研究細菌的16S全長序列,16S全長序列能將菌相分析至「種」的層級。菌種分類在16S核醣體RNA分析流程中是十分重要的一個步驟,但過往的文獻大多集中在探討菌種分類器對於次世代定序所產出的細菌短序列的分類效果,面對三代定序所產出的細菌16S全長片段的菌種分類效果卻鮮少有文獻探討之。因此,各個不同的菌種分類器對於原核生物16S全長序列的分類效能亦是本研究的探討主題之一。此外,這份論文也提出一個以菌種名稱為基礎來整合各個常用的16S資料庫之方法,期使該資料庫能提升「種(species)」層級的分類精確度。該整合資料庫的設計,主要是針對16S全長序列來進行菌種分類。 雖然三代定序所產出的原核生物16S全長序列含有較完整的物種分類資訊,能夠準確地將細菌分類至「種(species)」的層級。然而,截至目前為止,市面上仍然缺乏完整且便利的原核生物16S全長序列分析套件。因此,研發能夠分析原核生物16S全長序列的軟體有其必要性。此份研究亦探討如何開發一個友善便捷的線上原核生物16S全長分析系統,以提高研究人員的工作效率。 過去十年間,總體基因體學的研究人員不僅找出許多生物標記,也發表了大量的菌相資料,然而這些資料至今仍然缺乏具體的應用。此份論文除了探究如何開發總體基因體學的應用軟體,在最後也將提及將大量菌相資料與人工智慧做結合的可能發展方向。

並列摘要


The advancement of next-generation sequencing (NGS) technologies in the past decade has facilitated the development of metagenomics research. A large amount of literature focused on how to elucidate the bacterial composition of an environment by analyzing the prokaryotic 16S ribosomal RNA (rRNA) sequences. Therefore, many analytical tools for analyzing prokaryotic 16S rRNA short reads using NGS data were proposed. However, most of these tools were operated on the Linux system and only provided command line interface. For those who lacked programming background, it was difficult to learn the Linux command line and impeded their analytical tasks. Therefore, one of the topics of this dissertation is to improve the command line-based tools and develop a user-friendly online system with graphical interface to enhance researchers’ working efficiency. Next, in recent years, a growing number of research works used third-generation sequencing technologies to discern the bacterial composition to the species level by sequencing full-length prokaryotic 16S rRNA. Taxonomy assignment is a critical step in analyzing the prokaryotic 16S rRNA. Yet, most previous studies focused only on classifying the short bacterial 16S reads generated by NGS technologies. The classifier’s performance on bacterial 16S full-length sequence data generated by third-generation sequencing (TGS) still remained unclear. Therefore, taxonomy assignment performance of the 16S classifiers on classifying the prokaryotic 16S full-length sequences is another topic of this dissertation. In addition, a taxonomy-based method to integrate the widely-used 16S reference databases to classify 16S full-length sequences and improve the assignment accuracy at the species level was proposed in this dissertation. Although the prokaryotic 16S full-length reads provide comprehensive taxonomy information, there still lacks convenient analytical tools for analyzing prokaryotic 16S full-length sequences. Therefore, developing the analytical tools to support 16S full-length pipeline is in demand. The fourth topic of my dissertation focuses on discussing how to develop a user-friendly online prokaryotic 16S full-length analytical system to enhance the working efficiency of the researchers. Lastly, during the past decade, metagenomics researchers discovered many biomarkers and a large amount of microbial composition data has been released. However, there still lacks practical applications for the published data. Therefore, the possible future opportunities to combine the microbial big data with machine learning algorithms are discussed in the last chapter.

參考文獻


1. Barba, M., H. Czosnek, and A. Hadidi, Historical perspective, development and applications of next-generation sequencing in plant virology. Viruses, 2014. 6(1): p. 106-136.
2. Sawicki, M.P., et al., Human genome project. American journal of surgery, 1993. 165(2): p. 258-264.
3. Turnbaugh, P.J., et al., The human microbiome project. Nature, 2007. 449(7164): p. 804-810.
4. Shahi, S.K., S.N. Freedman, and A.K. Mangalam, Gut microbiome in multiple sclerosis: the players involved and the roles they play. Gut microbes, 2017. 8(6): p. 607-615.
5. Yang, B., Y. Wang, and P.-Y. Qian, Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis. BMC bioinformatics, 2016. 17(1): p. 1-8.

延伸閱讀