微陣列資料分類近年來被廣泛的應用在生物與醫學領域上,也因此有越來越多公開的微陣列資料可以取得,本論文研究是否不同實驗室所產生的微陣列資料集可以進一步結合,期望透過更大的資料集得到更準確的分類模型,並達到樣本共享的目的。當考慮將多樣性的微陣列資料加以合併用於未來的分類工作之時,資料集數值的取得與使用都必須更加小心,以避免產生錯誤的預測結果。完整的微陣列資料分類工作包含數個部分:1、原始資料轉換為基因表現值;2、樣本正規化;3、基因對應關係建立;4、特徵基因選取;5、屬性正規化。本論文研究何種基因表現值轉換演算法於多樣性微陣列資料分類有較佳的表現,並探討樣本與屬性正規化在多樣性樣本分類效果上的影響。本論文使用三組多樣性微陣列資料以評估此些因素於樣本分類的影響力與交互的關係。
Microarray experiments have been widely used in biological and medical research. When more and more public data sources available in the world, this thesis aims to investigate the possibility and develop a correct procedure of combining these heterogeneous data sources together for classification analysis. Sample classification includes several parts: 1. translating raw data expression values; 2. sample-wise normalization; 3. gene mapping; 4. gene selection; and 5. gene-wise normalization. This work studies which algorithm of translating expression values performs better in cross-generation and cross-laboratory analysis. In addition, the effect of sample- or feature-wise normalization on the performance of classification is examined. Eight data sets from heterogeneous sources are employed in this study to validate the proposed methodology.