透過您的圖書館登入
IP:18.119.120.159
  • 學位論文

微陣列實驗中檢測不同表現基因之統計方法評估

Evaluation of Statistical Methods for Identification of Differentially Expreseed Genes in Microarray Experiments

指導教授 : 劉仁沛

摘要


對於鑑定基因是否有顯著表現,現今的方法大多是使用傳統的假設檢定,檢定兩組樣本差異是否等於零。然而傳統的假設檢定並沒有考慮到具有生物意義的倍數變化,在生物領域中基因表現的倍數變化超過某些定值即認定該基因是有表現的。在微陣列實驗中,由於基因數通常很大且重複數通常很少,所以在檢定基因是否有顯著表現時整體型一錯誤會變的很大,必須使用不同的方法去修正,以期控制整體型一錯誤,例如: Bonferroni的方法、錯誤發現率或著是使用任意的閥值。但這些方法依然沒有考慮到生物準則。因此,我們提出一個考慮生物意義的區間假設檢定,並且提出統計程序及樣本數決定方式,同時也探討了此方法的一些統計特性。比較傳統假設檢定、區間假設檢定等五種方法,以模擬的方式得到經驗的整體型一錯誤、平均型一錯誤以及檢定力模擬結果顯示,區間假設可以有效的控制平均型I 誤差在名目水準之下,且整體型一誤差也比較低,檢定力相較於使用Bonferroni修正來的好。

並列摘要


Current statistical approaches to identifying differentially expressed genes are based on tradition hypotheses of equality. However, traditional hypothesis of equality fail to take into consideration the magnitudes of the biologically meaningful fold changes that truly differentiate the expression levels of genes between groups. Due to the large number of genes tested and small number of specimens available for microarray experiments, the false positive rate for differentially expressed genes is extremely high and requires many different adjustments such as Bonferroni’s method, false discovery rate, or use of an arbitrary cutoff for the p-values. All these adjustments do not have any biological justification. Hence, we propose to use the interval hypotheses by consideration of the minimal biologically meaningful expression levels for identification of differentially expressed genes. Based on the interval hypothesis, statistical procedures were proposed and the methods for sample size determination are also given. Statistical properties of the proposed procedures are investigated. A large simulation study was conducted to empirically compare the overall type I error, average type I error and power of the traditional hypothesis using unpaired two-sample t-test, the traditional hypothesis using the unpaired two-sample t-test with Bonferroni adjustment, the fixed fold-change rule, the method of combination of the traditional hypothesis using unpaired two-sample t-test and fixed fold-change rule, and the proposed interval under various combinations of fold changes, variability and sample sizes. Simulation results show that the proposed procedures based on the interval hypothesis not only can control the average type I error rate at the nominal level but also provide sufficient power to detect differentially expressed gene. Numeric data from public domains illustrate the proposed methods.

並列關鍵字

Interval hypothesis Type I error Power Fold change

參考文獻


[2] Black, M.A. and Doerge, R.W.. (2001) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Technical Report. Department of Statistics, Purdue University.
[3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Stati. Soc., Ser. B, 57:289-300.
[4] Chow, S.C. and Liu, J.P.. (1995) Design and analysis of bioavailability and bioequivalence studies. New York: Marcel Dekker , Inc.
[5] Chen, Y., Dougherty, E.R. and Bittner, M.L.. (1997) Ratio-based decisions and the quantitative analysis of cDNA microarray images. J Biomed. Opt., 2, 364-374.
[6] Dudoit, S., Yang, Y.H., Callow, M.J. and Speed, T.P.. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica,12, 111-139.

延伸閱讀