透過您的圖書館登入
IP:18.220.11.34
  • 學位論文

利用貝氏變數選取進行全基因組與存活時間之關聯研究

A Bayesian Variable Selection Approach to Genome-Wide Association Studies with Survival Outcome

指導教授 : 張憶壽 熊昭

摘要


將存活時間作為性狀來進行全基因組關聯性研究,在流行病學及藥物基因體學等都是一個重要的問題。例如,採用特定藥物或特定治療下,與疾病復發時間相關的基因,可以幫助我們找出比較適合此種特定藥物或治療的族群。大部分的全基因組關聯研究所採用的方式都是採用對單核苷酸多型性 (Single Nucleotide Polymorphism, SNP) 做個別分析,但這樣的方法在統計上並不是個能有效率的應用資料資訊的方式。此外,這樣的方法也往往有遺失遺傳度的問題。本篇論文提出了一個貝氏變數選取模型的統計方法,對全基因組與存活時間之關聯性進行多變量分析,並設計出一個可以代表遺傳度的變數,部分解決了單點分析中遺失疑傳度的問題。本論文將Guan and Stephen (2011) 的貝氏變數選取回歸模型,擴充到可以涵蓋以存活時間為性狀的全基因組關聯研究上。我們利用 Weibull 分配,配合比例風險回歸模型,建構出對變異程度可被解釋的比例 (proportional of variance explained, PVE) 的估計。本文採用貝氏的推論方式,估計PVE、及SNP和存活時間相關機率的事後分配。在我們的研究成果當中,關於PVE的研究成果可以幫助我們設計及訂定接下來的遺傳學研究方向,而SNP和存活時間相關機率的事後分配,也是一個相較於 p-value,較能表達SNP與存活時間相關性的統計量。在方法上,我們為針對這類的問題設計MCMC來進行估計,並利用模擬資料說明模型的估計效果,及展示相較於單點估計方式的較高統計檢定力。我們也將此方法應用在關於女性初經時間的全基因組關聯研究上,特別的,我們得到 PVE 的信賴區間為 (0.339, 0.401),略低於文獻上的遺傳度 0.50。

並列摘要


Genome-wide association studies (GWAS) using survival time as phenotype deserve attention. Important examples include time to progression or recurrence free survival of a cancer patient underwent a specific treatment and onset time of certain disease or biological event. Most existing GWAS utilize single SNP analysis that does not model the problem properly and hence is not statistically efficient. Moreover, while GWAS results are often reproducible, the discoveries can explain only small amount of heritability. This dissertation proposes a Bayesian variable selection approach to GWAS with survival outcome by utilizing Weibull regression model, in which the parameter describing the proportion of the variance of the survival phenotype explained by the covariates (PVE) admits an analytic form. Treating GWAS as a Bayesian variable selection problem, we extend Bayesian variable selection regression (BVSR) for GWAS using multiple linear regression [1]. In particular, we compute posterior distribution of PVE and posterior inclusion probability of each SNP for inference. The former is useful in planning future genetic studies. The latter describes the confidence of the association results. A carefully designed MCMC algorithm is used to sample the posterior distribution. Simulation studies show that both PVE and PIP (posterior inclusion probability) can be studied successfully and this method outperforms the single SNP analysis methods in terms of the plot of the number of true positive findings versus the number of false positive findings. We illustrate the method by studying the association between SNPs and age at menarche based on healthy female in Taiwan. In particular, we get the 90% credible interval of PVE being (0.339, 0.401), smaller than the reported heritability of 0.50.

參考文獻


1. Guan YT, Stephens M (2011) Bayesian Variable Selection Regression for Genome-Wide Association Studies and Other Large-Scale Problems. Annals of Applied Statistics 5: 1780-1815.
2. Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ (2008) Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet 4: e1000130.
3. Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25: 714-721.
4. He Q, Lin DY (2011) A variable selection method for genome-wide association studies. Bioinformatics 27: 1-8.
5. Zuber V, Duarte Silva AP, Strimmer K (2012) A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies. BMC Bioinformatics 13: 284.

延伸閱讀