透過您的圖書館登入
IP:3.144.12.14
  • 學位論文

利用平滑化處理與參照控制HiC資料來優化找尋基因體拷貝數變異

Improve the identification of Copy Number Variation using Smoothing Strategy and Incorporating Control HiC Data

指導教授 : 張家銘

摘要


基因體拷貝數變異多存在於不正常細胞中,如:腫瘤細胞。針對該類細胞如何偵測基因體 拷貝數變異對序列資料來說非常重要,移除了這些序列相關的偏差值可以讓下游的分析更 為準確。基因體拷貝數變異的現象也會出現在HiC資料當中,因此HiC可以作為偵測基因 體拷貝數變異的材料,而HiNT為目前利用HiC找出基因體拷貝數變異的方法中最頂尖的; 但在HiNT的正規化步驟中存在著震盪現象,因此我們藉由增加平滑化的處理以及參照HiC 控制組資料來減少震盪現象並且提升HiNT的準確度;最終我們得到更高的斯皮爾曼相關 係數(0.868 對比 0.837)、成功地預測更多的基因體拷貝數變異、更高的精准度(0.800 對比 0.750)與召回率(0.324 對比 0.243)。除此之外,我們若選擇只使用了自身染色體 的HiC資料時,在準確度略減的情況下,可以有更快的運算時間(1小時對比6分鐘)。

並列摘要


Copy number variation (CNV) often exists in abnormal cells such as cancer. Detecting the CNV of these cell lines is crucial for sequencing data since it makes downstream analysis more correct thanks to removing sequencing bias. The phenomenon of CNV appears on HiC data, as well. Thus HiC can be a material to identify CNV where HiNT is the state-of-the-art method. However, there exists a fluctuation phenomenon in the normalization step of HiNT. In this work, we want to eliminate the fluctuation phenomenon and further improve the performance of HiNT by adding a smoothing procedure which is a mean filter technique, and using HiC of the control cell line in the normalization step. As a result, we achieve a higher Spearman Correlation Coefficient (0.868 v.s. 0.837), more consistent CNV segments, higher precision (0.8 v.s. 0.75), and recall (0.324 v.s. 0.243). Besides, we speed up the running time ten times faster by using only intra-chromosomal information without losing too much performance.

參考文獻


1. Rui Yin, Chee Keong Kwoh, Jie Zheng, Whole Genome Sequencing Analysis, Editor(s): Shoba Ranganathan, Michael Gribskov, Kenta Nakai, Christian Schönbach, Encyclopedia of Bioinformatics and Computational Biology, Academic Press, 2019, Pages 176-183,
2. Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009 Oct 9;326(5950):289-93.
3. Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012 Feb 3;148(3):458-72.
4. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007 Jun 8;316(5830):1497-502.
5. Ashoor H, Louis-Brennetot C, Janoueix-Lerosey I, Bajic VB, Boeva V. HMCan-diff: a method to detect changes in histone modifications in cells with different genetic characteristics. Nucleic Acids Res. 2017 May 5;45(8):e58.

延伸閱讀