透過您的圖書館登入
IP:18.219.22.169
  • 期刊
  • OpenAccess

二分类变量的缺失数据插补研究

Research on Imputation Methods of Binary Variable's Missing Data

摘要


大部分标准统计方法假设用于分析的数据是完整的,但是通常有数据缺失问题的存在,因此缺失数据成为数据分析中普遍存在和无法回避的一个问题。在社会学、经济学、人口学等学科研究中,都需要使用二分类变量进行测量,二分类变量缺失数据的研究对进一步完善人们社会行为、消费行为等方面的研究具有重要意义。本文在完全随机缺失机制下,选择使用有效的协变量从单一回归插补法和多重插补法分别对二分类变量缺失数据插补进行了研究;使用统计软件SAS进行实证数据模拟分析,比较了两种插补法的优劣,并对多重随机插补中无法定量推导的插补次数M进行了经验值的分析,为在实践中使用多重插补法提供参考的插补次数值。

並列摘要


In most cases of statistical methods, data is supposed to be complete. However, data missing is inevitable. Therefore, missing data is very common in data analysis and need to be solved. In the study of sociology, economics, demography and other subjects, binary variables are required in measuring. The analysis of binary variable's missing data is essential to perfect man's behavior, such as social behavior, consuming behavior and so on.This paper premises on completely random mechanism of data missing; studies the imputation methods of binary variable's missing data by using covariant from single regression imputation and multiple imputation; makes simulation analysis on empirical data using statistical software SAS, compared the advantages and disadvantages of two methods; and researches on imputation times M, which cannot be quantitatively deducted in multiple imputation, providing the reference of the imputation times M when multiple imputation method is used in practice.

參考文獻


金勇进、邵军(2009)。缺失数据的统计处理。中???出版社。
王济川、郭志刚(2001)。Logistic回归模型─方法与应用。高等教育出版社。
Paul, Christopher,Mason, William M.,Fox, Sarah A.(2003).What should we do about missing data? (A case study using Logistic Regression with Missing data on a single covariate).,未出版Califmia center for population research.
Rubin, Donald B.(1987).Multiple imputation for nonresponse in Surveys.John Wiley & Sons Inc..
Kim, Jae Kwang,Fuller, Wayne(2004).Fractional hot deck imputation.Biometrika.91(3),559-578.

延伸閱讀