透過您的圖書館登入
IP:18.217.2.199
  • 學位論文

Affymetrix 高密度寡聚核苷酸晶片試驗統計分析方法之比較

A comparison of statistical methods for identifying differentially expressed genes using Affymetrix oligonucleotide arrays

指導教授 : 廖振鐸

摘要


生物晶片能同時平行檢測成千上萬基因的 mRNA 含量,間接說明基因的表現程度。Affymetrix GeneChipTM 公司的專利產品—Affymetrix 高密度寡聚核苷酸晶片 (high-density oligonucleotide array),為一種精準度及再現性 (reproducibility) 較高的 DNA 生物晶片。 Affymetrix GeneChipTM 公司 (2002) 的 Affymetrix Microarray Suite 5.0 (MAS 5.0)、Li and Wong (2001) 的 Model Based Expression Index (MBEI)、Irizarry et al. (2003) 的 Robust Multi-array Average (RMA) 為目前常用之三種表現量轉換方法。在我們的研究中,利用學生氏t檢定、Efron et al. (2001) 的 penalized t-statistic、無母數檢定 (Mann-Whitney test 或 Wilcoxon signed rank test) (Conover, 1999) 以及結合學生氏t統計值或無母數統計值之 Pepe et al. (2003) 的 selection probability function 選拔方法,討論三種表現量轉換方法之鑑別顯著差異表現探針組 (probe sets) 的結果,發現三種表現量轉換方法是有差異的。 此外,我們修正 Hess and Iyer (2004) 的模擬方法,在 R 環境下模擬試驗數據。利用統計模擬,我們間接證明了 Affymetrix 高密度寡聚核苷酸晶片的再現性。我們使用結合學生氏t統計值或無母數統計值之 selection probability function 選拔方法,以 sensitivity、specificity 和 false discovery rate 比較三種表現量轉換方法。在我們的研究中,建議使用 RMA 表現量轉換方法,MBEI 為緊跟其後具競爭力的方法。我們建議使用 Rat 230A 晶片試驗之重複數 (sample size) 在3 ~ 7之間。學生氏t統計值為相較於 Mann-Whitney test 統計值高效且穩定之 selection probability function 選拔方法統計值的選擇。最後將討論重複數的部分整合成一非常實用的演算法 (algorithm),提供給研究人員作為決定重複數之參考,以期能在試驗成本及效率之間取得平衡。

並列摘要


Microarray technology has made it possible to measure the abundance of mRNA transcripts for thousands of genes simultaneously. In particular, Affymetrix high-density oligonucleotide array, a patent for Affymetrix GeneChipTM, is very popular in the scientific community due to its high specificity and reproducible property. In this study, we first review three statistical methods, Affymetrix Microarray Suite 5.0 (MAS 5.0) (Affymetrix GeneChipTM, 2002), Model Based Expression Index (MBEI) (Li and Wong, 2001) and Robust Multi-array Average (RMA) (Irizarry et al., 2003), that are currently in use for background correction, normalization and expression transformation. Then we evaluate their performance based on significance tests of the resulting fold change estimates obtained from these methods. Student t-test, penalized t-statistic provided by Efron et al. (2002), and nonparametric test (Mann-Whitney test or Wilcoxon signed rank test) (Conover, 1999) are implemented for the significance test. It is shown that MAS 5.0, MBEI and RMA can lead to quite different conclusions for identification of the differentially expressed probe sets. Therefore, we develop a simulation mechanism to generate replicated experiments. The simulation study is modified from the method recently proposed by Hess and Iyer (2004). Our modified method can mimic naturally occurring data and is based on a real “temperate” array data. For each simulated data set, we directly use the selection probability function proposed by Pepe et al. (2003) with Student t statistic for ranking the expression levels of probe sets. We calculate sensitivity and false discovery rate (FDR) of the three methods based on 100 simulated data sets for various scenarios. We recommend RMA for routine applications because it appears to have higher sensitivity and smaller FDR in all the scenarios under study. Note that MBEI is competitive with RMA in most scenarios. In addition, we develop a practical algorithm to determine sample size of the experiments using Affymetrix oligonucleotide arrays.

參考文獻


12. Hess, A. M. and Iyer, H. K. (2004). Comparison of methods for detecting differentially expressed genes for high density oligonucleotide microarrays. preprint.
2. Affymetrix (2002). Statistical Algorithms Description Document.
3. Barash, Y., Dehan, E., Krupsky, M., Franklin, W., Geraci, M., Friedman, N. and Kaminski, N. (2004). Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays. Bioinformatics, 20: 839-846.
6. Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2): 185-193.
7. Conover, W. J. (1999). Practical nonparametric statistics. 3rd Ed. Wiley.

被引用紀錄


陳泰伸(2006)。校準模式在Affymetrix基因晶片資料上的應用〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2006.02946

延伸閱讀