透過您的圖書館登入
IP:18.232.185.167
  • 學位論文

利用混合加權方法對於罕見遺傳變異進行關聯性分析

A Hybrid Weight-Based Method for Genetic Association Studies with Rare Variants

指導教授 : 蕭朱杏

摘要


中文摘要 全基因體關聯性研究主要是針對與複雜性疾病相關的單一核苷酸多型性 (single nucleotide polymorphism)進行研究,這些研究多半侷限於次要對偶基因頻率 (minor allele frequency) 大於百分之五的部分;即使如此,這些單一核苷酸多型性卻還是不足以回答真正導致疾病發生的原因。由於生物技術的進步,透過次世代定序 (next-generation sequencing) 的技術,科學家開始探討罕見變異 (rare variants) 在複雜性遺傳疾病中所扮演的角色,希望透過這些罕見變異來找到導致複雜性疾病發生的基因。 目前對於罕見變異所發展的統計方法,主要是將一段遺傳區域內所有的罕見變異整合 (pooling) 為一個單元 (如摺疊法,collapsing method) 來考慮,再檢定這一個單元跟複雜性疾病之間的相關性。採用上述策略來分析有兩個優點,第一點是能夠降低我們分析資料之維度,第二點則是能夠避開稀疏性 (sparsity) 的問題;此外,這種透過摺疊法來分析的檢定力也比單一標誌基因分析方法 (single marker analysis) 來的好,已經有許多統計方法是根據摺疊法來發展的。由於要將一段區域內的罕見變異整合為一個單元,其間牽涉到對各個單一標誌基因的權重 (weight) 問題,有些學者給予它們一樣的加權;有些人則是利用控制組的罕見變異對偶基因頻率的標準差或罕見變異與疾病之間的關聯性大小來給予不同的加權比重。雖然這些方法考慮了不同的變異對於疾病的影響可能會不一樣而給予不同的權重,但卻都未考慮到遺傳異質性 (genetic heterogeneity) 中表型異質性 (phenotypic heterogeneity) 的部分。 在本文中,我們提出了一個混合加權 (hybrid weight) 的方法,同時考慮每個罕見變異對於疾病的差異性以及每個人的表型差異性,前者部分由單一罕見變異與二元或是連續型疾病表型的關聯性進行加權;後者則由人與人之間的相似度 (similarity) 來進行加權,這裏我們透過漢明距離 (hamming distance) 來測量人與人之間的不相似度,當某個人與其他人的表型不相似度越高時我們會給予這個人較低的權重;反之,則給予較高的權重。 為了解本文所提出之方法的表現,本文透過模擬研究,將本文所提出的方法與其他方法之第一型誤差以及檢定力做比較;並且本文也利用英國Wellcome Trust Case Control Consortium study (WTCCC) 中的冠狀動脈心臟病(coronary artery disease, CAD) 研究所蒐集的單一核苷酸多型性資料進行分析,嘗試找出與導致冠狀動脈心臟病發生的相關標誌基因。

並列摘要


ABSTRCT Most genome-wide association studies (GWAS) focusing on effects of common variants have failed to identify the susceptible genes associated with the common disease of interest. With the recent advancement of next-generation sequencing (NGS) technologies, scientists begin to investigate rare variants that may have higher effect sizes than common variants and contribute to the fraction of heritability that remains unexplained. Current methods considered a pooling strategy to test the joint effect of multiple rare variants. This pooling approach has the advantage of low dimensionality and is free of sparsity. In addition, it has been shown to exhibit larger power than single marker testing procedures. Several pooling methods assigned unequal weights that depend on marker allele frequencies or single-marker risks. These weights allow different variants to contribute differently to the risk of disease, but cannot account for the genetic heterogeneity including phenotypic heterogeneity. In this study, we propose a hybrid weight to combine the single variant effect and the individual heterogeneity. The proposed weight is composed of two parts. One represents the association between the disease status, binary or quantitative, and single rare variant. The other stands for the individual similarity. Here we adopt hamming distance to measure the similarity between any pair of individuals. Higher similarity leads to a larger weight on the individual considered. The performance of this test is demonstrated with simulation studies and the comparison with other methods is conducted based on type I error and power evaluation.

參考文獻


1. Pritchard, J.K. and Cox, N.J. The allelic architecture of human disease genes: common disease–common variant… or not? Human Molecular Genetics, 2002. 11(20): p. 2417-2423.
2. Bodmer, W. and Bonilla, C. Common and rare variants in multifactorial susceptibility to common diseases. Nature Genetics, 2008. 40(6): p. 695-701.
3. Pritchard, J.K., Are Rare Variants Responsible for Susceptibility to Complex Diseases? American Journal of Human Genetics, 2001. 69(1): p. 124-137.
4. Schork, N.J., et al., Common vs. rare allele hypotheses for complex diseases. Current Opinion in Genetics & Development, 2009. 19(3): p. 212-219.
5. Manolio, T.A., et al., Finding the missing heritability of complex diseases. Nature, 2009. 461(7265): p. 747-753.

延伸閱讀