Research of Modeling Overdispersion for Analyzing Plant Disease Incidence Data


在植物流行病學上病害發生率(即健康或罹病植株之二元資料)常具有群聚(aggregated)現象,當資料具有群聚現象時常會造成資料異質性(heterogeneity)的產生,使得進行統計分析時出現過度離勢(overdispersion)現象,過度離勢是指所欲估計參數的實際變異數大於原先期望的變異數,此現象會使得進行統計推論時發生謬誤,是分析的過程中極待克服之問題。近幾年來以統計方法在被物流行病學上處理群聚病害發生率之應用研究,主要為運用貝他-二項模式(Beta-binomial model)和威廉斯模式(Williams model)在於葡萄病害之研究,以及使用廣義線性混合模式(Generalized Linear Mixed Model; GLMM)於葡萄和草莓病害之調查。 本研究旨在探討病害發生率的研究中,以層次取樣法(Hierarchical sampling)進行抽樣,並以實際資料葡萄露菌病和草莓葉枯病的病害發生率為例,藉由計算離勢參數、-2 Res Log Likelihood、AIC、BIC和繪製殘差圖,比較邏輯斯回歸、貝他-二項模式、威廉斯模式與廣義線性混合模式四種模式對資料進行配適情形,來探討這些方法克服過度離勢的能力並比較其優劣。最後並用模擬的方法,模擬四組不同離散程度下之數據,以上述四種方法進行配適,探討病害發生率在層次取樣法下,各模式解決過度離勢的能力,冀望所得之結果能對植物流行病學上之統計方法運用有所助益。


Incidence data tend to be clustered or aggregated in plant epidemiology. The data in clusters exhibit overdispersion, a phenomenon known as heterogeneity, in statistical analysis. Overdispersion means that the actual variance of the parameter used exceeds the expected variance. This leads to an incorrect estimate of standard error for the parameter used. Thus the final conclusion is often misleading. Recently, a lot literature has discussed the overdispersion phenomenon in plant epidemiology. For example, the Beta-binomial model and Williams model for grape research and Generalized Linear Mixed Model (GLMM) in grape and strawberry research have both addressed this issue. The objective of this research is to evaluate the different methods for analyzing the aggregated plant disease incidence data through hierarchical sampling. In addressing the heterogeneity factor, this paper will use Logistic regression, the Beta-binomial model the Williams method arid GLMM to analyze the secondary data collected at the incidence of Downy mildew on grapes and Phomopsis leaf blight on strawberrys. Also, there will he a thorough discussion on the advantages and disadvantages among the different model approaches according to -2 Res Log Likelihood, AIC, BIC and residual plots. Finally these models will he used in order to analyze four simulated data sets. The result of the study can provide some objective suggestions for analyzing the binary data with overdispersion in plant epidemiology.
