比較MB-MDR與SPV方法在辨識顯著多重因子交互作用之表現

近年來，隨著基因探索與醫療領域的蓬勃發展，分析大量因子資料成為現代化統計的一項重大挑戰。在分析資料的過程中，我們會想要在多因子的資料下挑選出對於反應變數有顯著影響性的解釋變數，所以善用選擇模型的方法就顯得非常重要。選擇模型的方法發展至今已有許多，而本篇論文研究的是兩個針對反應變數為連續型，解釋變數為類別型的選模方法：MB-MDR與SPV。在比較這兩個選模方法時，我們利用電腦程式模擬出資料型態為數量性狀基因座(QTL)的基因資料，並分為大量因子與少量因子個數。利用MB-MDR與SPV分析資料後，使用平均準確率與平均錯誤率來評估選模的結果。結果顯示SPV在大樣本之下挑選主效應的平均準確率表現不錯，在小樣本之下較不理想，再者混合(主效應加交互作用)的平均準確率在所有樣本數設定下皆表現不理想，反觀其平均錯誤率皆很低。MB- MDR在所有樣本數設定之下其挑選交互作用的平均準確率都表現不錯，但相對的所有參數設定下其平均錯誤率較高。根據模擬的結果可知，選用MB-MDR或是SPV可視需求而定，例如想要探討的是模型的主效應或是交互作用，又或者要求是選模方法的高準確率或是低錯誤率，可由使用者自行斟酌後選用。

關鍵字

MB-MDR ； SPV ；數量性狀基因座(QTL)

並列摘要

In recent years, following the gene’s exploration and the development in medical field, analyzing the high dimensional of factors in data is a major challenge to modern statistics. In the process of data analyzing, we would identify the influential multifactor interactions which are significant to the response variable in a multifactor data, so the choice of model selection method is very important. Model selection method has been developed through different approaches so far, and in this paper the authors studied the two model selection methods which are for the response variable to be continuous and the independent variables to be categorical: MB-MDR and SPV. In comparing the two model selection methods, we use computer programs to simulate two sets of data for the quantitative trait loci (QTL). One set of data contained a large number of factors and the other contained a small number of factors. After analyzed the data with both MB-MDR and SPV methods, we use the average accuracy and average error rate to evaluate the results. The results showed that the SPV performed well in the average accuracy rate when identifying the main effects in the large sample, but did worse when deal with small samples. Furthermore, the results for mixed average accuracy (main effects with interactions) is worse than ideal under all samples settings, however, the average error rates are very low under all situations. The average accuracy rate of interactions based on MB-MDR in all samples setting, are all performed well, but it has higher average error rate under all situations. According to the simulation results, the selection of MB-MDR or SPV is based on the requirement of the user. For example, the user might be interested in exploring the main effects or interactions effects in a model, or requiring a high accuracy or low error rate, the users can make the choices based on their needs.

並列關鍵字

MB-MDR ； SPV ； Quantitative Trait Loci

參考文獻

34. 鄭榕鈺，曾信嘉(2007)。定位數量性狀基因座的秩迴歸方法。作物，環境與生物資訊，4(2)，109-118。

28. 吳淑惠，蕭朱杏(2004)。利用變異數組成模式對數量性狀基因座的貝氏統計推論。臺灣公共衛生雜誌，23(5)，355-364。

1. A. Agresti. (2000). Categorical data analysis (4th ed.) Wiley Online Library.

2. B. Devlin and N. Risch. (1995). A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics, 29(2), 311-322.

3. B. Freidlin, G. Zheng, Z. Li and J. L. Gastwirthb. (2002). Trend tests for case-control studies of genetic markers: Power, sample size and robustness. Hum Hered, 53, 146-152.

國際替代計量

比較MB-MDR與SPV方法在辨識顯著多重因子交互作用之表現

全文下載

主題瀏覽