調查資料中常用態度問項了解受訪者的看法或感受,而觀察資料多以等級尺度測量。假設具有連續型潛在(latent)因素,一般多用因素分析(factor analysis)來萃取幾個共同因素(common factors)。由於因素分析是利用相關係數矩陣(correlation matrix)或共變異數矩陣(covariance matrix)來估計共同因素,因此若原始資料遺漏過多,或者資料遺漏的原因是某些機制,例如隨機遺漏(missing at random, MAR),有某種特性的人容易成為遺漏值,因此變項間的相關性可能遭到扭曲,使得所獲得的因素跟實際應有的因素有所偏差。不完整資料研究領域的研究顯示:許多統計分析方法在缺失比例過高時,使用部分缺失資料做統計分析與完全刪除缺失部分的分析結果會有差異。 本研究延續王鴻龍等(2007)以「台灣教育長期追蹤資料庫」第二波調查中,有關高中生心理健康題之態度問項資料,以等級資料為主,探討使用缺失資料與使用完全刪除缺失部分進行統計分析,在共變異數矩陣估計上的差異,以及因素分析對缺失資料多寡的敏感度,在共同因素的個數估計上或是因素負荷的估計上的差別;根據原有的缺失結構,建構不同比例的缺失資料組,在常態觀察值的假設下,發現當缺失資料比例超過16%時,使用具有缺失資料估計的共變異數矩陣將會造成顯著性偏誤,我們建議缺失資料應進行處理;在缺失資料處理方法上,我們以原始的缺失結構出發,並以所有變項都完整之資料作為基準(baseline),探討6% ~ 40%的缺失比例下,不同的處理方法對於高中生心理健康題資料的因素分析的影響。本研究以處理等級資料之多元相關係數矩陣(Polychoric Correlation)進行因素分析,發現若使用整筆刪除法(list-wise deletion, LD)在缺失比例較低(34% 以下)時因素結構與基準結構差異不大,當缺失比例達到34%左右時,其因素結構與基準資料出現差異。MCMC填補法則在大部分的缺失比例下都有不錯的表現。
We often use the attitude amount form to understand interviewees’ opinion in questionnaire survey, and use ordinal scale to measure the observations. Under the continuous latent factors assumption, we often use factor analysis to extract common latent factors from observable variables. If the original data is omitted too much, there are some mechanisms behind, for example, missing at random, the missing data, so that the person with a certain characteristic apt to become omitting value, the estimation based on the observed data will twisted the dependence between variables. Therefore, the factors obtained from that analysis may completely different from the real factor. The researcher had shown that high proportion of missing may cause significant bias in certain statistical analysis. In this study, we extend Wang’s (2007) result and relies mainly on the data of attitude about high school students’ psychological health from the education tracks database. We focus on ordinal data and investigate the data having various missing proportions to find out the critical proportion of missing that may cause significant bias on the estimation of covariance matrix. We also investigate susceptibility of factor analysis on the proportion of missing data and find out the difference on the number of common factors and the estimation of factor loading. According to the original missing mechanism, we construct datasets of several missing proportions, say 6%~40%. Under the assumption of normality, we find that starting from 16% missing proportion, the estimation of covariance matrix will be biased significantly. Base on the original missing mechanism, we consider the complete part as baseline to find out the effects on the ways of handling the missing data in factor analysis. We use polychoric correlation to run the factor analysis. The result shows that the list-wise deletion method works fine in low missing proportion (< 34%). MCMC method performs good in most of the missing proportions. The available case method is the worse among 4 methods.