透過您的圖書館登入
IP:18.119.17.64
  • 學位論文

利用兩階段變異數分析模型分析定序資料之拷貝數變異

Two-step ANOVA model for copy number variation detection on targeted sequencing

指導教授 : 謝文萍
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


拷貝數變異(copy number variation)是一種發生在 DNA 序列上長片段的異常改變,被認為與多種人類疾病的成因有關,在遺傳疾病及癌症的研究上皆扮演著重要的角色。近年隨著定序技術的發展,分析拷貝數變異的門檻下降,多種分析工具不斷被提出應用。然而,這些工具通常是使用低讀序深度、長定序片段,以及正常、疾病成對資料作為分析,甚少對於高讀序深度、短定序片段的資料作出討論。本研究提出一個兩階段變異數分析模型(two-step ANOVA model)作為檢測拷貝數變異的工具,此模型適用於不同類型的資料,其概念為透過兩個不同的變異數分析模型,首先估計正常樣本中的基礎效應,然後對疾病樣本作出有效之修正及處理,從而檢測出拷貝數變異。本文利用三個部分來呈現此模型的有效性:在模擬分析上,透過設定不同的拷貝數變異比例,觀察模型檢測拷貝數變異的功效及其變化,同時與另一個分析工具 ExomeCNV 比較結果,對於讀序深度較高的資料,我們的方法說明了傳統的變異數分析已經可以清楚的分辨差異;在實例分析上,以口腔癌病人數據作分析展示;在相關性分析上,以口腔癌相關疾病作實例說明。綜合三個部分,本研究提出的模型有較低的假陽性率(false positive rate,FPR),為較保守的檢測拷貝數變異工具;而相關性研究的結果表明,此模型能找到與口腔癌相關的拷貝數變異。

並列摘要


Copy number variation (CNV) is a form of structural variation which has abnormal alterations of the copy number in the genome. It is considered to be related to diverse diseases so CNV plays an important role for the study of genetic diseases and cancer. Following the advances in the technology of next-generation sequencing (NGS), the difficulty of analyzing CNV is a decrease and some analysis tools have been developed to detect CNVs. However, the majority of current researches are focused on the data which are of low coverage, long interested regions, and paired case-control samples. There is lack of discussions with high coverage, short targeted regions and unpaired case-control samples. Therefore in this study, we propose a two-step ANOVA model to detect CNVs. This model can be applied on different types of data. The main idea of this model is to apply two different ANOVA models to discover CNVs. We first estimate the base effects from control samples, and then the case samples are effectively adjusted with the base effects in order to detecting CNVs. The results have been summarized by three parts. In the simulation study, different CNVs incidence rates are designed to observe the change of results by the proposed model. We also have a comparison with the other tool, ExomeCNV, in the simulated study. In real data analysis, we show the result of what our model has found in the oral cancer data. In association study, we indicate an analysis process for some clinical traits of oral cancer. In conclusion, the two-step ANOVA model has lower false positive rate. Our model also indicates that the conventional ANOVA model has great performance over the high coverage data compared to sophisticate schemes. The association study detected several important CNVs that are very likely to play an important role in the oral cancer etiology.

參考文獻


1. Zhao, M., et al., Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics, 2013. 14(Suppl 11): p. S1.
2. Freeman, J.L., et al., Copy number variation: new insights in genome diversity. Genome research, 2006. 16(8): p. 949-961.
3. Redon, R., et al., Global variation in copy number in the human genome. nature, 2006. 444(7118): p. 444-454.
4. de Ligt, J., et al., Detection of Clinically Relevant Copy Number Variants with Whole-Exome Sequencing. Human Mutation, 2013. 34(10): p. 1439-1448.
5. Snijders, A.M., et al., Assembly of microarrays for genome-wide measurement of DNA copy number. Nature genetics, 2001. 29(3): p. 263-264.

延伸閱讀