透過您的圖書館登入
IP:18.227.114.125
  • 學位論文

以混合 Beta 模型估計多重假設檢定下虛無假設為真的比例

Using mixture beta models to estimate the proportion of true null hypotheses in the multiple hypotheses testing

指導教授 : 汪群超
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


微陣列 (Microarray) 可在同一時間偵測大量的基因表現資料,研究者在得到這些資料後,接下來的目標即同時去辨識不同群組間基因表現是否有差異,如此則形成一個多重假設檢定的問題。控制整體型 I 誤差率 (α) 一直是多重假設檢定的重要議題,Benjamini & Hochberg (1995) 建構出一個使用 Seeger(1968) 所定義的錯誤發現率 (False Discovery Rate, FDR) 的控制準則,而 Benjamini & Hochberg (2000) 將此控制準則做了修正,讓 FDR 能如預期控制在 α 之下,此修正需使用到多重假設檢定中虛無假設為真的比例,但是實際上真實比例是未知的,所以估計此比例成為關鍵所在。 Allison 等 (2002) 提出以混合 Beta 模型來描述多重假設檢定所對應的 p 值,進而估計虛無假設為真的比例,理論上虛無假設為真對應的 p 值會符合 Uniform 分配,而對立假設為真的部份則可用多個 Beta 分配來敘述, Beta 分配的個數可藉由一種利用拔靴法的檢定方式來判定,然而運算上要耗費較多時間,故 Allison 等 (2002) 直接使用 1 個 Uniform 分配加上 1 個 Beta 分配的混合模型來進行模擬。本文發現當基因間相關性增加時,虛無假設為真所對應的 p 值會產生群聚現象,並漸背離 Uniform 分配,因此建議捨棄 Uniform 分配,直接以兩個未知的 Beta 分配模型來估計。蒙地卡羅模擬顯示此模型能有較穩健且精準的估計結果。另外除了使用混合模型的估計方式,本文也將 Benjamini & Hochberg (2000) 提出的斜率法納入比較。

並列摘要


Microarrays can be used to detect the expression of thousands of genes. After the gene expression is measured, investigators may try to identify these genes for which there is differential expression across groups. Controlling the overall Type I error (α) is an important issue in the multiple hypotheses testing. The family-wise error rate (FWER) and false discovery rate (FDR) are commonly used to difne the overall Type I error. Benjamini & Hochberg (1995) developed a FDR controlling procudure (BH procedure). However, when the number of true alternative hypotheses increse, the BH procedure becomes very conservative. Therefore, Benjamini & Hochberg (2000) proposed an adaptive FDR controlling procedure (Adaptive BH procedure) that incorporates the proportion of ture null hypotheses (π0) and the procedure is shown to have ability in controlling error. Nevertheless, π0 is unknown. Hence, how to estimate π0 is the pivotal issue. Allison et al. (2002) proposed using the mixture beta model to model the corresponding p values from the multiple hypotheses. Under the null hypothesis, the distribution of p values is uniform on the interval [0,1]. On the contrary, under the alternative hypothesis, the distribution of p values can be modeled as a mixture of V separate component beta distributions. Allison et al. (2002) model the p-values with a uniform distribution plus a beta distribution in there simulations. However, when the correlation between gene expression levels increases, the corresponding p values from the null hypotheses tend to cluster closer to one than to zero. That causes the improper use of the uniform distribution. This thesis suggests replacing the uniform distribution by a regular beta distribution in the mixed beta model. Monte Carlo simulations show that the model without uniform distribution has more robust and accurate performance in highly correlated situations. The estimation method proposed by Benjamini & Hochberg (2000) is also compared.

參考文獻


許乾祐 (2008). 利用混合模型估計多重比較中真實虛無假設個數. 國立臺北大學統計學系碩士論文.
黃義筌 (2009). 模擬發現控制 FDR 檢定過程之表現特質. 國立臺北大學統計學系碩士論文.
蔡明哲 (2008). 探討以真實虛無假設個數估計量修正控制 FDR 之多重比較法. 國立臺北大學統計學系碩士論文.
Allison, D. B., Gadbury, G. L., Heo, M., Fernandez, J. R., Lee, C.-K., Prolla, T. A., & Weindruch, R. (2002). A mixture model approach for the analysis of microarray gene expression data. Computational Statistics and Data Analysis, 39, 1-20.
Benjamini, Y. & Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25, 60-83.

延伸閱讀