一般問卷調查結果往往因為受訪者個人因素而有所疏漏,例如受訪者單純漏填某問項,亦或是較為敏感政治議題或收入等涉及個人隱私問項,而未填答。許多社會科學及心理健康問卷中經常發生此情形,此時研究者囿於一般統計軟體限制,而刪除不完整資料之受訪者問卷,以完整資料進行分析。但被刪除的資料有時具備隱藏之訊息,且缺失比例太高,剩餘資料又不足以代表資料全貌,並可能扭曲問卷最後分析結果。故探討缺失資料問題,從中獲得缺失資料所呈現之訊息,成為問卷處理之重要議題。 本研究以「台灣教育長期追蹤資料庫」( Taiwan Education Panel Survey ,TEPS)2001年學生問卷及家長問卷為研究對象,並針對心理健康問題共15題變項,根據原有缺失結構,建構出一倍、三倍、五倍、七倍、十倍之缺失資料集(大約是5%、14%、23%、31%及41%缺失比例)。並依不同缺失比例隨機建構各30組,共計150組缺失資料集進行缺失資料之處理。本研究之缺失處理方法為整筆刪除法(List-wise Delete ,LD)、可用資料法(Available Case ,AC) 、逐步邏輯斯迴歸插補法(Stepwise Logistic regression, SLR)及蒙地卡羅馬可夫鏈法(Monte Carlo Markov Chain ,MCMC)。探討在不同比例之缺失資料集下,不同缺失處理方法分別與完整資料集(baseline)在使用多變量變異數分析時顯著因子個數、R2變化、自變數係數等差異做比較,並提出當缺失比例低(約5%)各缺失處理方法差異不大。而缺失比例高(約14%以上),MCMC法為四種缺失處理方法中,缺失處理後最接近完整資料集之方法。
In general, the outcomes of survey will be incomplete because of the respondent's personal factors. For instance, respondents merely miss to fill out some items, or reject to fill out items regarding privacy such as sensitive political issues and personal revenue. It is a common case in many social science and mental health surveys. For the sake of the restriction on ordinary statistics software, researchers have to exclude those respondent's surveys with missing data to analysis in complete data. But sometimes there are a few hidden information in deleted data and probably the results of the survey will be distorted. For the reasons mentioned above, we have to investigate the missing data's problems to get some figures showed in missing data. It becomes the important topics of survey treatment. In this study, we use Taiwan Education Panel Survey's questionnaire of students and their parents in 2001 for objects, and aim at 15 mental health questions. According to the raw missing patterns, we create five missing data sets which are one time, three, five, seven, and ten times.(The missing percentage is about 5%, 14%, 23%, 31%, 41% respectively.) We also randomly creative five groups of 30 missing data sets respectively(150 sets in total) based on different missing percentage to process the missing data. The methods of missing treatment in this study are List-wise Delete(LD), Available Case(AC), Stepwise Logistic regression(SLR), and Monte Carlo Markov Chain(MCMC). In different missing data sets, we compare the different missing treatment and baseline when both of two are used for Multivariate analysis of variance (MANOVA) on numbers of significant factors, variations of R-square, and coefficients of dependent variables. The finding shows that there is little difference as the missing percentage is low(about 5%), and the MCMC is the closest method to baseline among fours after missing treatment when the missing percentage is high(about 14%).
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。