透過您的圖書館登入
IP:3.145.38.251
  • 學位論文

DNA 拷貝數在人類基因體中的分析

DNA Copy Number Data Analysis in Human Genomes

指導教授 : 趙坤茂

摘要


拷貝數變異 (Copy Number Variation, CNV) 是一種人類基因體中的結構變異,並且已知與多種遺傳疾病相關。微陣列基因體比較雜合法 (Array Comparative Genomic Hybridization, Array CGH) 可以提供生物學家高解析度的DNA 拷貝數分析,由於微陣列基因體比較雜合法的解析度不斷上升,其信雜比也逐漸提高。為了處理這些雜訊,微陣列基因體比較雜合法之實驗數據將再被分析,以找出人類基因體中拷貝數變異的區段。Lipson 等人提出一個統計上的架構去精確地找出人類基因體中拷貝數變異的邊界並提供各區段之顯著性指標。在這個架構中,微陣列基因體比較雜合法之雜訊被假設為常態分佈。然而目前尚未有研究支持此假設。同時,許多統計的方法也是基於同樣的假設。在本論文中,我們不對雜訊的分佈作假設,而提出一個改進的架構,並同時發展一套系統性的方法來選擇此架構中的參數。在我們的架構之下,我們使用一個由Bernholt 等人所提出的演算法來找出人類基因體中的拷貝數變異區段。然而,Bernholt 等人所提出的演算法無法分別地找出DNA 片段之複製與刪除事件。因此,我們也提出一個線性時間演算法來分別地找出DNA 片段之複製與刪除事件。我們使用此改進的架構去分析一個急性骨髓性白血病的微陣列基因體比較雜合法之實驗數據。我們的方法可以更精準地找出拷貝數變異的位置並找到許多包含與急性骨髓性白血病相關之基因的DNA 區段。

並列摘要


Copy number variations (CNVs) are one kind of structural variations in the human genome and are associated with many genetic diseases. Array CGH approaches can provide biologists high resolution analysis of DNA copy number data. Since the resolution of array CGH approaches is increasing, the signal-to-noise ratio is also getting higher in recent array CGH approaches. To handle the noise in the array CGH approaches, the experimental results are further analyzed to locate the copy number variations in the human genome. Lipson et al. [Journal of Computational Biology, 13(2):215-228, 2006] propose a statistical framework which enables us to find the boundaries of copy number variations in the human genome accurately and provides the significance for each aberration calling. It is assumed that the noise in the array CGH data is normally distributed in the framework. However, there is no evidence supporting this assumption. Furthermore, many statistical approaches also suffer this problem. In this thesis, we propose an improved framework without making the assumption. We also develop a systematic method for selecting the parameters in our framework. A linear time algorithm proposed by Bernholt et al. [7th Latin Americal Symposium, pages 178-189, 2006] is used to find copy number variations under this framework. However, their algorithm cannot find duplication events and deletion events of the human genome separately. Thus, a linear time algorithm for this purpose is proposed. We demonstrate the power of our methods by applying them to an array CGH dataset from leukemia patients. Our methods locate the CNVs in the array CGH data more accurately and finds regions which contain genes related to the acute myeloid leukemia.

參考文獻


Ruppert, N. Mohamed, R. V. Davuluri, M. A. Caligiuri, et al. Acute myeloid leukemia
with complex karyotypes and abnormal chromosome 21: amplification discloses overexpression
101:3915–3920, 2004.
genomic hybridization using oligonucleotide microarrays and total genomic DNA.
Weinstein, A.-L. Børresen-Dale, and Z. Yakhini. Framework for identifying common aberrations

延伸閱讀