Comparative Analysis and Application of Imputed Estimators for Population Mean under Stratified Unequal Probability Sampling

With continuously increasing demand for accurate data, the sampling design of surveys has become more and more complex. Unequal probability sampling methods are therefore increasingly used in sample surveys. Item nonresponse is inevitable in survey practice. How to obtain unbiased estimation with data imputation for a complex survey is thus an important issue for research. Previous studies have presented some imputed estimators for equal probability sampling with uniform response. It would be worthwhile to explore the performance of imputed estimators applied to complex surveys, such as unequal probability sampling or different missing data mechanisms. This study aims to present imputed estimators of the population mean for survey data imputed with an auxiliary variable under a stratified unequal probability sampling design, and to compare their performance in terms of different missing data mechanisms and different levels of the correlation coefficient between the auxiliary variable and the variable of interest. By taking nonresponse and imputation into account, this study derives three imputed estimators (weighted, unweighted, and bias-adjusted imputed estimators) and their corresponding variance estimators with stratified unequal probability sampling, where missing data are imputed by ratio imputation. Six cases under different conditions (missing data mechanisms, population distribution, and sample allocation) are selected for a simulation study to compare the performance of the proposed imputed estimators in terms of relative bias and coefficient of variation. The relative bias of the variance estimators is also studied to compare the performance of the corresponding variance estimators. A practical application is performed to show how to apply the imputed estimators derived in this study to real survey data. As expected, simulation results show that the performance of the estimators varies depending on the missing data mechanisms, population distributions, and methods of sample allocation. Simulation results indicate that the estimation precision of the imputed estimator increases as the correlation between the auxiliary variable and the variable of interest increases for all three imputed estimators. The imputed estimators perform with greater stability in cases of missing completely at random (MCAR) than in cases of missing at random (MAR). Comparing the performance among the three imputed estimators, this study shows that in cases of high correlation between the auxiliary variable and the variable of interest, the proposed bias-adjusted estimator works well with stratified unequal probability sampling in reducing the estimation bias and the underestimation of mean square error (MSE) due to unweighted imputation. Moreover, the variance estimator of the bias-adjusted estimator has the smallest relative bias for estimating MSE compared with the two others. The unadjusted imputed estimator with unweighted imputation may cause estimation bias, while its corresponding variance estimators may also underestimate the MSE of the estimator. However, simulation results do not reveal that the bias-adjusted estimator performs better than the imputed estimator with weighted imputation except at a high level of correlation between the auxiliary variable and the variable of interest. In practice, an auxiliary variable which has high correlation with the variable of interest, is commonly used to impute missing values to increase estimation precision. If the survey weights are unavailable and unweighted ratio imputation is used to impute missing values, the proposed bias-adjusted estimator with the corresponding variance estimator is suggested for obtaining a better estimation.

關鍵字

nonresponse ； ratio imputation ； imputed estimator ； bias-adjusted estimator ； unequal probability sampling

並列摘要

調查實務上遺漏值在所難免，如何在複雜抽樣設計下結合遺漏值插補而能得到不偏估計量成為重要的研究課題。本文旨在探討分層不等機率抽樣下結合輔助變數插補遺漏值的插補估計量在不同遺漏機制（MCAR、MAR）及輔助變數與興趣變數之不同相關水準下的表現。本文在分層不等機率抽樣下結合比率插補法導出三種母體均數插補估計量（加權、未加權及偏誤調整）及其變異數估計量。利用插補估計量之相對偏誤及變異係數與其變異數估計量之相對偏誤，比較分析插補估計量的表現，並以一實例說明這些插補估計量如何應用於實際調查資料。模擬結果顯示，三個估計量的估計精確度都將隨著輔助變數和興趣變數相關性的增加而增加，插補估計量在MCAR遺漏機制表現較為穩定。本文所提偏誤調整插補估計量在輔助變數與興趣變數具有高度相關時，確可減少來自未加權的估計偏誤並降低均方誤的低估。實務上，若無權重資料可用而採未加權比率插補，本文所提的偏差調整插補估計量可用以得到較佳的估計。

並列關鍵字

無回應；比率插補；插補估計量；偏誤調整估計量；不等機率抽樣

參考文獻

Al-Jararha, Jehad M., and Mazen Sulaiman, 2020, “Horvitz-Thompson Estimator Based on the Auxiliary Variable.” Statistics in Transition New Series 21(1): 37–53.

Google Scholar

Chen, Sixia, and David Haziza, 2019, “Recent Developments in Dealing with Item Non–response in Surveys: A Critical Review.” International Statistical Review 87(S1): S192–S218.

Google Scholar

Cochran, William G., 1977, Sampling Techniques (3rd ed.). New York: John Wiley & Sons.

Google Scholar

Fay, Robert E., 1991, “A Design-Based Perspective on Missing Data Variance.” Pp. 429–440 in Proceedings of the 1991 Annual Research Conference, edited by Bureau of the Census. Washington, DC: U.S. Bureau of the Census.

Google Scholar

Haziza, David, and Jon N. K. Rao, 2003, “Inference for Population Means under Unweighted Imputation for Missing Survey Data.” Survey Methodology 29: 81–90.

Google Scholar

國際替代計量

Comparative Analysis and Application of Imputed Estimators for Population Mean under Stratified Unequal Probability Sampling

全文下載

主題瀏覽