透過您的圖書館登入
IP:3.138.33.178
  • 學位論文

於罕見變異關聯研究裡使用疾病危險分數來處理族群分層的問題

Using Disease Risk Scores to Account for Population Stratification in Rare Variant Association Studies

指導教授 : 林菀俞

摘要


背景:在基因研究裡,吾人常收集病例對照資料,比較兩組的對偶基因頻率。然而,病例組與對照組可能有不同的族群來源結構,而使得兩組未必可互比較。研究者常使用與檢測基因無相關的標識基因來建構主成份,使用數十個重要的主成份於羅吉斯迴歸裡以為調整項,藉此來調整掉病例組與對照組間族群來源的差異。 方法:次世代定序(next-generation sequencing)的成本仍頗高,許多研究並無法負擔全基因組定序的費用,而只能定序某一小段有興趣的染色體區段。本研究探討在500 kb (kilo base pairs)的染色體區段上,使用疾病危險分數(disease risk scores)於序列核相關檢定(sequence kernel association test)中,以為族群分層之調整。 結果:根據蒙地卡羅模擬(Monte Carlo simulations),使用疾病危險分數於序列核相關檢定中,比起傳統直接使用主成份分數(principal component scores)於序列核相關檢定中,疾病危險分數更能調整族群分層的偏差。 建議:若研究者有500 kb 以上的染色體區段定序資料,建議以較遠離檢測基因的常見單核苷酸多型性(常見指次要對偶基因頻率大於5%)來建構疾病危險分數,再以此疾病危險分數放入序列核相關檢定中調整病例組與對照組的族群來源差異。

並列摘要


Background: In genetic studies, we often collect unrelated cases and controls and compare allele frequencies between the two groups. However, cases and controls may come from different ancestral populations, and the allele frequencies of the two groups cannot be compared directly. Researchers usually use markers unlinked to the gene of interest to construct principal components. By using tens of important principal components as covariates in the logistic regression, we can adjust for the ancestral difference between the cases and the controls. Method: The cost of next-generation sequencing is still high. Many studies cannot afford to the cost of whole-genome sequencing, and may only afford to sequence a chromosomal region of interest. In this study, we discuss the situation that only a 500 kb (kilo base pairs) region can be sequenced. We use disease risk scores to account for population stratification in the sequence kernel association test. Result: According to the Monte Carlo simulations, using disease risk scores in the sequence kernel association test can adjust for population stratification more efficiently, compared with the conventional approach of using principal component scores. Suggestion: If researchers have a sequenced region longer than 500 kb, we suggest using common single-nucleotide polymorphisms (with minor allele frequency > 5%) far from the gene of interest to construct disease risk scores, and adjusting the disease risk scores in the sequence kernel association test to account for the population stratification.

參考文獻


Arbogast PG, Ray WA. 2011. Performance of disease risk scores, propensity scores, and traditional multivariable outcome regression in the presence of multiple confounders. Am J Epidemiol 174(5):613-20.
Babron M-C, de Tayrac M, Rutledge DN, Zeggini E, Génin E. 2012. Rare and low frequency variant stratification in the UK population: description and impact on association tests. PloS one 7(10):e46519.
Basu S, Pan W. 2011. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 35(7):606-19.
Davies RB. 1980. Algorithm AS 155: The distribution of a linear combination of χ 2 random variables. Applied Statistics:323-333.
Devlin B, Roeder K. 1999. Genomic control for association studies. Biometrics 55(4):997-1004.

延伸閱讀