使用有效性指標選取基於EM半參數混合風險的模型

Cox比例風險模型(Cox proportional hazards model)是一種經常在存活分析中使用的迴歸模型，此模型探討生存時間的分布和自變量的關係，可以應用在醫學、健康照護等領域。當模型中隱含著潛在變數(latent variables)時，利用混合迴歸模型(mixture regression model)分析這些變數的影響是一種合適的方法。　　在使用混合模型時，選擇適當的模型組件個數是的一個重要議題，雖然有效性指標(validity indices)是選擇模型的方法中重要的一環，但是目前很少學者利用有效性指標選擇混合迴歸模型的模型組件個數。在這篇論文中，我們參考現有其它模型的指標，利用後驗概率(posterior probabilities)和殘差(residuals)發展出新的指標，且做一系列模擬來驗證新指標的有效性。　　Cox比例風險模型包含基準風險函數(baseline hazard function)及比例迴歸模型(proportional regression model)兩個部分，估計基準風險函數一直是個富有挑戰性的議題，有的學者假設基準風險函數服從特定的時間分配，有的假設為分段常數函數(piecewise constant)。在這篇論文中，我們利用內核方法(kernel estimator)來估計基準風險函數，並發展EM演算法來估計混合迴歸模型的參數。　　模擬結果顯示，估計基準風險函數時，利用內核方法表現的結果優於分段常數函數，因為內核方法將曲線估計得更為平滑，改善分段常數函數僵硬結構的缺點。此外，根據新指標選擇正確模型個數的高比例，推測新指標在選擇模型組件個數的表現上是有效的。

關鍵字

混合迴歸模型； Cox比例風險模型； EM演算法；內核方法；有效性指標

並列摘要

The Cox proportional hazards model is commonly used in survival analysis for describing the relationship between the survival time and covariates. The model is applied in many fields such as medicine, health care, and so on. In cases that some latent variables are involved in the model, mixture regression models are more suitable for analyzing the effects of these variables. 　　The determination of the number of model components is an important issue when using the mixture models. Although validity indices are a vital branch of model selection, however, they are less used for deciding the number of components in mixture regression models. In this thesis, we propose some new indices based on the posterior probabilities and residuals by referring to the existing methods. The effectiveness of the proposed new indices has been verified through extensive simulations. 　　The Cox proportional hazard model consists of two parts: the baseline hazard function and the proportional regression model. The estimation of baseline hazard function is known to be a challenging issue. Some researchers assumed that the baseline hazard function follow a specific lifetime distribution and some others assumed it is piecewise constant. In this thesis, the baseline hazard function is estimated by kernel estimator and the mixture regression model is estimated by using the expectation and maximization (EM) algorithm. 　　In estimating the baseline hazard function, the simulation results show that the estimated model with the kernel estimator is better suited for the data set than the piecewise constant model because the fitted curve is smoother and the kernel estimator improves the stiff structure of the piecewise constant estimator as well. Moreover, the effectiveness of the new indices in selecting the number of components is verified through experiments that a high precision of number selection of components using the new indices.

並列關鍵字

Mixture regression model ； Cox proportional hazards model ； EM-algorithm ； Kernel estimator ； Validity indices

參考文獻

[1] D.R. Cox. Regression models and life-tables (with Discussion). Journal of the Royal Statistical Society; Series B 1972; 34:187–220.

Google Scholar

[2] R.L. Prentice, J.D. Kalbfleisch, A.V. Peterson, N. Flournoy, V.T. Farewell, N.E. Breslow. The analysis of failure times in the presence of competing risks. Biometrics 1978; 34:541–554.

Google Scholar

[3] J.D. Kalbfleisch, R.L. Prentice. The Statistical Analysis of Failure Time Data. Wiley 1980.

Google Scholar

[4] J Benichou, M.H. Gail. Estimates of absolute cause-specific risk in cohort studies. Biometrics 1992; 46:813–826.

Google Scholar

[5] J.J. Gaynor, E.J. Feuer, C.C. Tan et al. On the use of cause-specific failure and conditional failure probabilities: examples from clinical oncology data. Journal of the American Statistical Association 1993; 88:400–409.

Google Scholar

國際替代計量

使用有效性指標選取基於EM半參數混合風險的模型

主題瀏覽