針對遺失變數的統計方法

在環境檢測與實驗研究中，由於收集的資料容易受儀器的偵測極限影響而產生遺失值。過去多數文獻只針對模型中至多兩個變數受偵測極限影響的資料，進行簡單替代法、插補法或模式建構法等的處理方法；而當模型中含有多個受偵測極限影響的變數且大多全為連續變數或類別變數時，則使用蒙地卡羅EM演算法搭配抽樣法以便解決高維度的積分問題，進而求得參數的估計。本論文著重於羅吉斯模型分析中同時具有多個受偵測極限影響的連續變數和具有隨機遺失機制的類別變數而更為複雜的資料，並提供一個估計迴歸係數的估計方法。我們利用蒙地卡羅積分方法解決EM演算法中E步驟因受偵測極限與隨機遺失影響所產生的高維度積分。最後並引入已有的另兩種解決受偵測極限影響的資料之方法，比較不同方法的各自表現。由模擬結果中，顯示本論文的方法在不同的設限比例下，迴歸係數的估計都較Schistermanet al.(2006)與完整觀察個體分析來得不偏與精準。

關鍵字

偵測極限；最大概似估計量；隨機遺失； Monte Carlo EM ；牛頓法

並列摘要

In many environmental and laboratory studies, instrument detection limits often lead to missing values of the data. The existing methods for the regression analysis for the data with at most two covariates subject to detection limits include simple substitution, imputation, and model-based methods. While either multiple continuous covariates or multiple categorical covariates alone are subject to detection limits, the most common approaches are the model-based method, Expectation-Maximization (EM) algorithm, and a Monte Carlo version of EM algorithm to obtain the maximum likelihood estimates via sampling. In this paper, we consider a more complex case of missing covariates that both multiple continuous covariates subject to detection limits and categorical covariates with missing at random mechanism are presented in the logistic regression analysis. The aim of this paper is to provide a method for estimating the parameters of regression models for data with covariates subject to detection limit and missing at random mechanism. We use the Monte Carlo version for the E-step of the EM algorithm to tackle the high dimensional integration and summation due to the missing covariates subject to detection limits and random missing. We conduct a simulation study to compare the performance of the proposed Monte Carlo EM algorithm approach with the complete-case method and the imputation method proposed by Schisterman et al. (2006). The results of the simulation study showed that the proposed approach resulted in relatively unbiased estimates with smaller standard error than the complete-case method and the imputation method by Schisterman et al, (2006).

並列關鍵字

detection limits ； maximum likelihood estimation ； missing at random ； Monte Carlo EM ； Newton-Raphson methods

參考文獻

2. D'Anaelo, G., Weissfeld, L., & Investigators, G. (2008). An index approach for the Cox model with left censored covariates. Statistics in Medicine, 27(22), 4502-4514. doi: Doi 10.1002/Sim.3285

3. Ibrahim, J. G. (1990). Incomplete Data in Generalized Linear-Models. Journal of the American Statistical Association, 85(411), 765-769.

4. Lipsitz, S. R., & Ibrahim, J. G. (1996). A conditional model for incomplete covariates in parametric regression models. Biometrika, 83(4), 916-922.

5. Little, R. J. A., & Schluchter, M. D. (1985). Maximum-Likelihood Estimation for Mixed Continuous and Categorical-Data with Missing Values. Biometrika, 72(3), 497-512

7. Lubin, J. H., Colt, J. S., Camann, D., Davis, S., Cerhan, J. R., Severson, R. K., . . . Hartge, P. (2004). Epidemiologic evaluation of measurement data in the presence of detection limits. Environmental Health Perspectives, 112(17), 1691-1696. doi: Doi 10.1289/Ehp.7199

國際替代計量

針對遺失變數的統計方法

主題瀏覽