透過您的圖書館登入
IP:3.144.252.197
  • 期刊

Improving De Novo Assembly by Pre-Processing the Next-Generation Sequencing Data

經由資料前處理改善次世代測序資料之全新組裝

摘要


為了改善次世代測序資料全新組裝的品質,我們探討三種資料前處理,包含品質切割,錯誤校正和隨機排序。處理過的資料由三種工具Velvet,SOAPdenovo和ABySS予以組裝。本研究應用部分因子設計尋找適當的組裝工具參數並減少計算的時間。Phix 174是用來作為驗證的基因組。我們的研究驗證了適當的品質切割和錯誤校正,將有效改善組裝的品質。另一項有用的觀察是Velvet 的組裝會受資料隨機排序所影響而SOAPdenovo和ABySS不會。

並列摘要


In order to improve the quality of de novo assembly of next-generation sequencing data, we provide treatments for the data pre-processing stage by quality trim, error correction and random shuffle. All of the treated data are assembled by three tools: Velvet, SOAPdenovo and ABySS. A fractional factorial design is implemented to reduce the runs needed to find the proper parameters of the tools. We validate the treatment effects by the alignments results of bacteriophage Phix 174, whose genome is well studied. Our results confirmed that quality trim and error correction will provide essential improvements to de novo assembly. After quality trim, random shuffle of the reads may not lead to any improvements by using SOAPdenovo and ABySS. However, random shuffle did improve the results of using Velvet alone.

參考文獻


Baker, M.(2012).De novo genome assembly: what every biologist should know.Nature Methods.38(4),333-337.
Cock, P. J. A.,Fields, C. J.,Goto, N.,Heuser, M. L.,Rice, P. M.(2010).Survey and summary the Sanger fastq file format for sequences with quality scores, and the SolexA/Illumina FASTQ variants.Nucleic Acids Research.38,1767-1771.
Cox, M. P.,Peterson, D. A.,Biggs, P. J.(2010).SolexaQA: At-a-glance quality assessment of Illumina second-generation sequencing data.BMC Bioinformatics.11,485-490.
Ilie, L.,Fazayeli, F.,Ilie, S.(2011).HiTEC: Accurate error correction in highthroughput sequencing data.Bioinformatics.27,295-302.
Kao, W.-C.,Chan, A. H.,Song, Y. S.(2011).ECHO: A reference-free short-read error correction algorithm.Genome Reseach.21,1081-1192.

延伸閱讀