透過您的圖書館登入
IP:3.133.149.168
  • 學位論文

以基於生物分類之效能評估 16S 和 18S rRNA 基因分類方法

Evaluating 16S and 18S rRNA Gene Classification Methods Using Taxonomy Based Performance Metrics

指導教授 : 曹承礎
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


總體基因體學實驗通常通過測序16S和18S rRNA來推斷微生物群落。 分類指派(Taxonomic Assignment)是這些研究的基本步驟。 先前研究中用於測量現有生物分類方法性能的準確性或其他指標有兩個主要問題:基於序列計數和二元誤差量測。 這些使得評估結果具有誤導性,且缺乏完整資訊。 在這項研究中,我們調查兩個問題的不利影響,然後提出新的性能指標:平均分類距離(ATD)和ATD_by_Taxa 以及ATD圖來解決上述兩個問題。 通過比較舊指標和新指標的評估結果,我們發現新的指標於三個測試資料的結果更具信息性,可比性和可靠性。

並列摘要


Metagenomics experiments often make inference on microbial communities by sequencing the 16S and 18S rRNA. Taxonomic assignment is a fundamental step in such studies. The accuracy or other metrics used by previous studies for measuring performance of existing taxonomic assignment methods had two major problems: Sequence count based metrics and Binary error measurement. These made the evaluation results misleading and less informative. In this study, we investigate the bad influences of two problems and then purposed new performance metrics, Average Taxonomy Distance(ATD) and ATD_by_Taxa together with the ATD plot to deal with the problems. By comparing the evaluation results in old metrics and in our new metrics, we found the results more informative, comparable and robust across three test data sets.

參考文獻


[1] Hayssam Soueidan, Macha Nikolski, "Machine learning for metagenomics: methods and tools," Quantitative Biology, 2016. https://arxiv.org/abs/1510.06621v2
[2] Wang Q, Garrity GM, Tiedje JM, Cole JR. "Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy." Appl. Environ. Microbiol., 73, 5261-7, 2007
[4] Nikhil Chaudhary, Ashok K. Sharma, Piyush Agarwa, Ankit Gupta, Vineet K. Sharma. "16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets" PLoS ONE 10, e0116106, 2015.
[5] Cole, J. R., Q. Wang, J. A. Fish, B. Chai, D. M. McGarrell, Y. Sun, C. T. Brown, A. Porras-Alfaro, C. R. Kuske, and J. M. Tiedje. "Ribosomal Database Project: data and tools for high throughput rRNA analysis" Nucl. Acids Res. 42(Database issue):D633-D642, 2013. doi: 10.1093/nar/gkt1244
[6] Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO "The SILVA ribosomal RNA gene database project: improved data processing and web-based tools." Nucl. Acids Res. 41 (D1): D590-D596, 2012.

延伸閱讀