透過您的圖書館登入
IP:3.236.19.251
  • 學位論文

缺失資料處理對多變量變異數分析(MANOVA)的影響

The Effect of the Missing Data Techniques for Multivariate analysis of variance(MANOVA)

指導教授 : 王鴻龍博士
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


一般問卷調查結果往往因為受訪者個人因素而有所疏漏,例如受訪者單純漏填某問項,亦或是較為敏感政治議題或收入等涉及個人隱私問項,而未填答。許多社會科學及心理健康問卷中經常發生此情形,此時研究者囿於一般統計軟體限制,而刪除不完整資料之受訪者問卷,以完整資料進行分析。但被刪除的資料有時具備隱藏之訊息,且缺失比例太高,剩餘資料又不足以代表資料全貌,並可能扭曲問卷最後分析結果。故探討缺失資料問題,從中獲得缺失資料所呈現之訊息,成為問卷處理之重要議題。 本研究以「台灣教育長期追蹤資料庫」( Taiwan Education Panel Survey ,TEPS)2001年學生問卷及家長問卷為研究對象,並針對心理健康問題共15題變項,根據原有缺失結構,建構出一倍、三倍、五倍、七倍、十倍之缺失資料集(大約是5%、14%、23%、31%及41%缺失比例)。並依不同缺失比例隨機建構各30組,共計150組缺失資料集進行缺失資料之處理。本研究之缺失處理方法為整筆刪除法(List-wise Delete ,LD)、可用資料法(Available Case ,AC) 、逐步邏輯斯迴歸插補法(Stepwise Logistic regression, SLR)及蒙地卡羅馬可夫鏈法(Monte Carlo Markov Chain ,MCMC)。探討在不同比例之缺失資料集下,不同缺失處理方法分別與完整資料集(baseline)在使用多變量變異數分析時顯著因子個數、R2變化、自變數係數等差異做比較,並提出當缺失比例低(約5%)各缺失處理方法差異不大。而缺失比例高(約14%以上),MCMC法為四種缺失處理方法中,缺失處理後最接近完整資料集之方法。

並列摘要


In general, the outcomes of survey will be incomplete because of the respondent's personal factors. For instance, respondents merely miss to fill out some items, or reject to fill out items regarding privacy such as sensitive political issues and personal revenue. It is a common case in many social science and mental health surveys. For the sake of the restriction on ordinary statistics software, researchers have to exclude those respondent's surveys with missing data to analysis in complete data. But sometimes there are a few hidden information in deleted data and probably the results of the survey will be distorted. For the reasons mentioned above, we have to investigate the missing data's problems to get some figures showed in missing data. It becomes the important topics of survey treatment. In this study, we use Taiwan Education Panel Survey's questionnaire of students and their parents in 2001 for objects, and aim at 15 mental health questions. According to the raw missing patterns, we create five missing data sets which are one time, three, five, seven, and ten times.(The missing percentage is about 5%, 14%, 23%, 31%, 41% respectively.) We also randomly creative five groups of 30 missing data sets respectively(150 sets in total) based on different missing percentage to process the missing data. The methods of missing treatment in this study are List-wise Delete(LD), Available Case(AC), Stepwise Logistic regression(SLR), and Monte Carlo Markov Chain(MCMC). In different missing data sets, we compare the different missing treatment and baseline when both of two are used for Multivariate analysis of variance (MANOVA) on numbers of significant factors, variations of R-square, and coefficients of dependent variables. The finding shows that there is little difference as the missing percentage is low(about 5%), and the MCMC is the closest method to baseline among fours after missing treatment when the missing percentage is high(about 14%).

參考文獻


黃齡葦(2005)。遺失資料之多重插補法模擬比較研究。國立台灣大學農藝學研究所碩士論文。
劉畢琳(2010)。多重插補法在非完整資料統計分析上之應用。國立台灣大學農藝學研究所碩士論文。
黃禎貞、林世華(2010) 。台灣與美國青少年心理健康泛文化比較之研究。中華心理衛生學刊第二十三卷第三期,465-491。 
楊孟麗(2005)。教育成就的價值與青少年的心理健康。中華心理衛生學刊,第十八卷第二期,75-99。
鄭雅心(2007)。探討國三青少年個人、家庭、學校因素對憂鬱情緒之影響。國立成功大學教育研究所碩士論文。

延伸閱讀