多重插補法在非完整資料統計分析上之應用

本文探討多重插補法在非完整資料統計分析上之應用。一般而言，在調查或收集資料時都會要求資料之完整，盡可能不要有缺失。但實際上，有許多情況下無法達到此要求，例如試驗單位的死亡，或者其它外在因素造成資料的闕無。當資料不完整時，會影響分析的效率，可能造成母體參數估計上的偏誤。所以需要推估缺失的資料點以形成完整之資料以利分析。 Rubin(1987)提出多重插補法，將每一個缺失值都代入m＞1個可能值，形成m套資料以供分析母體參數。本文以SAS 9.1 User’s Guild中的例子作為完整資料，將資料隨機缺失5%、10%、15%和20%之後進行插補和分析，再與原始分析結果做比較，以了解多重插補法在使用上的成效。模擬主要分為三部份：第一部份為可估計母體參數的統計分析方法，為迴歸分析和羅吉斯迴歸；第二部份為非估計母體參數的統計分析方法，為主成份分析、因素分析、鑑別分析、多變量分析和典型相關分析五種；第三部份是共變數結構的比較，為任意的共變數結構、混合對稱的共變數結構、第一級自我迴歸的共變數結構和Toeplitz氏的共變數結構四種。在第一部份得到的結果為在進行變數篩選時，迴歸分析會隨著缺失比例的增加而和完整資料所篩選出的結果差異漸增；而羅吉斯迴歸分析則無此情況，但此兩者的P值皆在缺失比例小時較能得到和完整資料相近的結果。在非估計母體參數的統計方法中，發現缺失插補後的分析結果最接近的為因素分析，其次為鑑別分析、多變量分析和典型相關三者，主成份分析和完整資料的差異最為明顯。共變數結構的模擬結果可發現結構若為任意的共變數結構、混合對稱的共變數結構和第一級自我迴歸的共變數結構三者並未改變，但是在Toeplitz氏的共變數結構中發現，若缺失比例較高時，可能會造成共變數結構的改變。

關鍵字

多重插補法；不完整資料；資料缺失；隨機缺失；馬可夫鏈蒙地卡羅

並列摘要

This paper investigates the application of multiple imputation on the statistical analysis of incomplete data. Many statistical analysis methods are designed and applicable only to complete data, and the incomplete data must be amended to meet the requirement. Rubin (1987) proposed the method of multiple imputation by substituting m>1 possible values for each missing data. The resulting m sets of complete data are then subject to ordinary statistical analyses. The analysis results of these m sets of imputed completed data are combined together to provide for 5%, 10%, 15% and 20% missing proportions, and compared the analysis results with those of the original complete data. Simulations in this paper were divided into 3 parts. The first is for the estimation of population parameters such as regression analysis and logistic regression. The second is for multivariate statistical analysis for multivariate normally distributed data. The third is about the covariance structures of multivariate data. Results from the first part of simulation showed that the discrepancies of parameter estimates between complete data and incomplete data are proportional to missing proportion for regression analysis, but less obvious for logistic regression. Results from the second parts of simulations indicated that the factor analysis is most sensitive to missing proportion. Results from the third parts of simulations revealed that most of the variance structures studied in this paper are also robust to missing proportion.

並列關鍵字

Multiple Imputation ； Incomplete Data ； Missing Data ； Missing at Random ； Markov Chain Monte Carlo

參考文獻

5. 黃齡葦(2005)。遺失資料之多重插補法模擬比較研究。台灣大學碩士論文。

1. Agresti, A. (1990), Categorical Data Analysis. New York: John Wiley & Sons, Inc.

4. Hotelling, H. (1936), Relations Between Two Sets of Variables, Biometrika, 28, 321 – 377

5. Johnson and Wichern(2007),Applied Multivariate Statistical Analysis,6th,Pearson

7. Rubin D.B.(1976), Inference and missing data, Biometrika 63:581-592.

被引用紀錄

洪靜茹（2013）。缺失資料處理對多變量變異數分析(MANOVA)的影響〔碩士論文，國立臺北大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0023-0602201318131300

國際替代計量

多重插補法在非完整資料統計分析上之應用

主題瀏覽