乳癌之基因微陣列晶片數據整合與分析

本研究的目的在於探討子群標準化(Subgroup Standardization)在數據整合時的功效以及適用情形。一批乳癌病患樣本由於雜合時所使用的緩衝液為不同的水質來源，使階層化分群樹狀圖依著不同緩衝液分成兩個大群組，而並非按病患樣本最原始的生物特性分群，因此導致重複實驗的產生，製造了另外一批使用原先緩衝液且一模一樣的病患樣本，最後我們得到兩組包含了相同36位乳癌病患者的數據。因為實驗上的異質因子帶來了數據分析上的複雜度，甚至還帶來錯誤的分群結果，但同時卻也帶來另外一組重複實驗的數據，以及讓我們想辦法解決這項問題的契機。我們發現在套用文獻中知名且有效的正規化方法後加入子群標準化動作，能有效移除上述緩衝液造成的偏差，並且達到數據整合的功效。由於是相同病患的重複實驗，因此若預期達成數據整合目標，則利用階層化分群繪圖時，應能將36位病患共72個樣本依照每位病患為單體分為36個小群，我們以配對率來當作成效評估的標準；當我們採用Subgroup Standardization後，原本使用LOWESS正規化方法的數據配對率提高了94%，而對Median Rank Scores (MRS)則改進31%，即使是原本就相當有效的Quantile Normalization其配對率也改善了8%。利用三種文獻中提出的不同正規化方式當作驗證，Subgroup Standardization不論原本使用何種正規化方式，均能改善微陣列生物晶片數據整合的表現。接著採用乳癌病患的臨床資料作更進一步的驗證，一樣使用上述經過不同正規化方式處理後的數據，來觀察Subgroup Standardization對於ER正負分類的效果。首先採用文獻中提出可找出重要需求基因的TSP分類器，接著使用所有Score為1的基因組作階層化分群，由ER正負分類結果作為驗證，結果再度證明Subgroup Standardization對於ER分類有良好的功效。在得到上述結論後，進一步我們希望探討在三個目標下能得到的最佳結果：配對率、ER作階層化分群結果以及敏感度分析。我們使用TSP分類器挑出所有重要的基因組，接著利用模擬退火的最適化來挑選特定組數下的基因組合，接著將結果繪圖總結出最佳的正規化方式搭配最佳的基因組數。我們發現在針對配對率的地方，最佳的正規化方式為LOWESS加上Subgroup Standardization，基因組數不能太少；若要最適化ER分類情形，則較好的正規化方式為LOWESSd配上Standardization；而對於個別樣本的敏感度分析，則使用Quantile Normalization或者甚至直接使用原始數據都會得到較好的結果，且基因組數不能太多，否則將急速降低其敏感性。

關鍵字

微陣列生物晶片；正規化；系統生物

並列摘要

Unsuccessful clustering, as a result of different hybridization buffer used in a second set of samples, leads to repetitive experiments on the same samples using the original buffer. Thus, we have two sets of gene expression data for the same 36 samples, breast cancer samples. This heterogeneity provides unnecessary complication in data analysis and, even worse, given false classification in clustering. However, this repetition provides an ultimate test on data treatment methods for possible removal of buffer effects and, eventually, a useful approach for data integration. Subgroup standardization is proposed to compensate for the buffer effect in microarray experiments. This is performed immediately after the normalization step. Provided with repetitive microarray experiments on all 36 samples, the percentage of pair-wise matching for all 36 samples using hierarchical clustering can be used to evaluate different approaches. Using the subgroup standardization, the matching rate is improved by a factor of 94%, 31% and 8% for Lowess, Median Rank Scores (MRS), and quantile normalizations, respectively. The proposed subgroup standardization enhances the performance of data integration for microarray data, regardless of normalization methods. The results are validated via repetitive experiments for the same samples using different buffers on the same platform. Using pair-wise matching from hierarchical clustering as a measure, quantile normalization performs better than MRS, with Lowess performing the worst. However, they all can be further improved using subgroup standardization. To take one step ahead, we aim to classify the ER positive and ER negative patient groups based on the different normalization methods with and without subgroup standardization. We completely imitate the TSP classifier to choose candidate genes about ER values and apply simulated annealing to search for the optimized combination of genes according to the scores. Then we could compare the outcome and effects by some indications, such as matching rate, sensitivity, specificity, and ER hierarchical clustering results both in training data and testing data. We discover that subgroup standardization is useful and helpful to classify ER positive or negative patients and also matching rate when collocating hierarchical clustering. It is an effective way when we try to view the group performance of the whole data sets. Since the sensitivity is bad, however, we should not use it when we want to peruse the behavior and details of every single sample.

並列關鍵字

Microarray ； Normalization ； Bioinformatics

參考文獻

[3] 陳筱瑋，「乳癌之基因微陣列分析研究─探討基因表現與單核苷

[4] Barbacioru C; Wang Y; Canales RD; Sun YA; Keys DN; Chan F; Poulter KA; Samaha RR, “Effect of various normalization methods on applied biosystems expression array system data”, BMC Bioinformatics, 7:533 (2006).

[5] Bammler T et al., “Standardizing global global gene expression analysis between laboratories and across platforms”, Nature Methods, 2:351-356 (2005).

[6] Bhattacharjee A et al., “Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses”, PNAS 98:13790-13795 (2001).

[7] Bolstad BM; Irizarry RA; Astrand M; Speed TP, “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias”, Bioinformatics, 19:185-193 (2002).

國際替代計量

乳癌之基因微陣列晶片數據整合與分析

全文下載

主題瀏覽