透過您的圖書館登入
IP:216.73.216.60
  • 學位論文

多點膨脹截斷卜瓦松迴歸模型之研究

A study on Multiple-Inflated Truncated Poisson Regression Model

指導教授 : 林定香
共同指導教授 : 蔡旻曉
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


計數資料 (count data) 廣泛存在於各個領域中,實際搜集到的計數資料經常包含過多的「零值」與某些「非零正整數」,例如一個月內有幾天覺得身體不適,受試者的回答集中於 0 天、 30 天、 5 與 7 倍數的天數。這類過多的「零值」與某些「非零正整數」發生的現象為多點膨脹 (multiple-inflation)。 本論文將延伸零膨脹卜瓦松分配 (Zero-inflated Poisson distribution) 的概念,研究問卷調查中常見到「某行為在兩星期內發生天數」的題目,將事件發生的天數分成膨脹值與非膨脹值,前者包含「 0 」、「 14 」、「 5 」、「 7 」、「 10 」值,並假設它們發生的機率服從多項分配 (Multinomial distribution);後者為假設事件發生數服從截斷卜瓦松分配 (truncated Poisson distribution),在此假設下可推導出多點膨脹截斷卜瓦松模型。再分別利用邏輯斯模型 (logistic model) 與對數連結的 (log link) 截斷卜瓦松模型,建立多點膨脹截斷卜瓦松迴歸模型 (Multiple-inflated truncated Poisson regression model, MITP)。 本論文在模擬研究中,比較截斷卜瓦松分配、零膨脹截斷卜瓦松分配、零 M 膨脹截斷卜瓦松分配與多點膨脹截斷卜瓦松分配四種迴歸模型,在不同因素下,資料配適的表現。這些因素包含生成資料的模型、樣本數、解釋變數斜率、零膨脹率與 M 膨脹率,並以概似比檢定選擇在各種因素下的最佳模型。 模擬結果顯示,不同的樣本數與解釋變數斜率並不會影響各模型資料配適的表現;生成資料的模型、零膨脹率與 M 膨脹率則會影響各模型資料配適的表現,而概似比檢定之結果只會受到生成資料模型的影響。在實證研究中,以 2010 年健康危害行為監測系統 (Behavioral risk factor surveillance system) 的問卷調查內容,探討截斷卜瓦松分配、零膨脹截斷卜瓦松分配、零 M 膨脹截斷卜瓦松分配與多點膨脹截斷卜瓦松分配四種迴歸模型配適實際資料的情況。研究結果顯示,本論文所提出的多點膨脹截斷卜瓦松迴歸模型配適表現最佳。

並列摘要


The count data are quite common in various fields, and excessive zeros and some non-zero positive integers often occur in real data, for example, number of days subjects felt uncomfortable in the past month. The respondents often answer 0, 30, and multiples of 5 and 7. The situation is an example of multiple-inflated data. In this study, we extended the concept of zero-inflated Poisson distribution, and used the number of some certain behavior in two weeks as a case. The events can be grouped two parts: inflated values and non-inflated values; the former include values of 0, 14, 5, 7 and 10, with Multinomial distribution, the latter were truncated Poisson distribution. We propose a multiple-inflated truncated Poisson regression model, and the model is a mixture of the multinomial logistic and the truncated Poisson regression. In the simulation study, we compared the goodness of fit of TP, ZIP, ZMITP and MITP models and used likelihood ratio test for model selection. The effects we studied are true model, sample size, explanatory variable, zero and M proportions. The simulation study showed that the goodness of fit affected by true model, zero proportion and M proportion, but not sample size and explanatory variable. The empirical study used 2010 behavioral risk factor surveillance system data to compare the performance of the four regression models. The results show that the multiple-inflated truncated Poisson regression model outperforms the other models.

參考文獻


王文華 (2009) 台灣地區自殺企圖者之重複自殺企圖次數統計模型探討。國立政治大學統計研究所碩士論文。
鄧詠竹 (2012) 零 膨脹卜瓦松迴歸模型之研究。國立台北大學統計研究所碩士論文。
Agarwal, D.K., Gelfand, A.E., Citron-Pousty, S. (2002) Zero-inflated models with application to spatial count data. Environmental and Ecological Statistics, 9, 341-355.
Angers, J.F., Biswas, A. (2003) A bayesian analysis of zero-inflated generalized poisson model. Computational Statistics and Data Analysis, 42, 37-46.
Behavioral risk factor surveillance system (BRFSS), http://www.cdc.gov/brfss/.

延伸閱讀