摘 要 癌症是一種基因改變的疾病,目前已經知道有二種基因和癌症的形成有關,一種是致癌基因;另一種是抑癌基因。癌症的形成往往是由數種致癌基因的活化、數種抑癌基因的功能喪失或DNA修補基因以及其他相關因子的突變所累積而成的。目前不同癌細胞中各種基因表現量的變化已經可以利用cDNA微陣列進行分析。這方面的研究,在過去已累積了相當多的資料。一般在微陣列的統計檢定所設定的α值為0.05,本論文的目的在於探討0.05是否適用於所有微陣列實驗的表現資料,試圖找出更精確的α值來篩選出基因標記。我們首先收集分析肝癌和乳癌的微陣列基因表現資料,先對資料做前置篩選動作,過濾掉不確定性的資料,再以統計學上估計檢定的T-test來研究微陣列基因表現資料庫的可靠度,結合過度表現與表現不足的檢體比例,決定出一個可靠的α值作為使用微陣列基因表現資料篩選致癌基因的參考標準,在分析過後,發現α值是必須依照不同微陣列實驗的結果來量身決定。如此ㄧ來,才能更有效的協助生物學家能經由資料庫中的資訊快速的尋求解答,並能快速驗證出與癌症相關的基因。
Abstract Cancer is due to gene mutation. At present, there are two kinds of genes, oncogene and tumor suppressor gene, involved in cancer. Cancer is usually resulted from the activation of oncogenes, malfunction of tumor suppressor genes, and mutation of DNA repair genes or other cancer-related factors. cDNA microarray can be used to analyze the gene expression in different cancer cells. Enormous researches have been studied in this area. When analyzing microarray data using T-test, the α-value is usually set to 0.05 no matter what kind of data is considered? The purpose of this thesis is to discuss whether the value of α = 0.05 is valid or not in analyzing various microarray gene expression data. We attempt to find out a more precise α value to identify gene markers. In our study, we have collected microarray gene expression data of liver and breast cancers. We first sieve out uncertain data. Then, we analyze the reliability of microarray gene expression databases using T-test. Finally, we choose a reliable α value according to ratios of over and under expression values. We found the α value must be set according to different microarray experiments. We hope the result is useful for the biologists to find out information and to verify cancer-related genes effectively.