透過您的圖書館登入
IP:3.138.33.87
  • 學位論文

統計模型於新冠肺炎防疫評估

Statistical Models for Evaluating COVID-19 Pandemic

指導教授 : 陳秀熙

摘要


背景 面對新冠肺炎大流行下不同高關注變異株 (VOC) 持續出現,急需新穎的統計模型方法了解疾病的流行與影響。因此,本論文的研究目的包括兩個部分:第一部分是發展機器學習方法於族群層級防疫作為的評估,結合無監督和監督方式,評估在COVID-19不同波流行之下NPIs作為和疫苗接種對社區疫情爆發的影響;第二部分則是發展一系列隨機過程模式估計個人層級從感染、症狀前期到症狀期的自然進展,並應用於邊境控制的精準監測及不同病毒量層級轉的轉轉移以進行流行病學監測。 材料與方法 本論文使用2020年至2022年1月間全球開放式資料進行分析,包括考量疫苗施打情境下解封指數(Social Distancing Index, SDI)、有效再生數等。並利用臺灣地區2021年5月至7月的社區感染資料,結合人口學特徵、症狀和個人病毒量進行個人層級自然病史統計模型的估計。首先,以易感-感染-傳染-恢復(SEIR)模型估計有效再生數(Rt)。結合無監督及監督機器學習方法預測COVID-19的傳播式,開發貝氏隨機多階段馬可夫模型,估計COVID-19疾病進展,以此為基礎進行電腦模擬並提供不同邊境控制精準策略的預期結果。最後,發展病毒量導引馬可夫模型,以四階段馬可夫回歸模式和九階段離散時間馬可夫模型,用於模擬個案在恢復前於不同隱狀態(Hidden State)之間與病毒量相關的詳細動態轉換。 結果 從2021年1月1日到2022年1月22日,全球的流行病至少有三波流行。2021年5月臺灣社區流行的實證資料顯示NPI手段和檢測在前爆發剛開始的兩周估計其降低流行的效益達60%,並在2021年6月14日之後增强到超過90%,同時Rt從2021年5月的4.40下降到7月的0.29。 本論文使用的監督機器學習三種(SVM、邏輯斯回歸和貝氏網絡(Bayesian Network, BN))中,BN在AUC方面表現最為出色,其次則為邏輯斯回歸和SVM。BN將全球流行資料區分為兩個群集:疫苗主導群集(群集1)及NPI主導群集(群集2)。 利用臺灣2020年3月至2022年1月境外移入個案估計個人層級疾病進展模式,本論文將資料依變異株種類及流行趨勢分為7個時期,包括兩期D614G、兩期Alpha、兩期Delta,及近期Omicron。在D614G-1時期,無症狀COVID-19的每日發生率估計爲109(每10萬人)(95%信賴區間(CI):98-121),D614G-2時期下降到40(95% CI:30-51),Alpha-1時期回到163(95% CI:141-188),在疫苗廣泛接種的Alpha-2、Delta-1和Delta-2時期發生率再次分別下降到117(95% CI。100-135)、97(95%CI:77-120)和112(95%CI:90-134),而最近出現的VOC Omicron期則又使發生率重新上升到317(95%CI:267-371)。若以5天隔離期估算,Omicron將累積最多從症狀前期發展到症狀期的個案(94%),其次是Delta(74%和80%於兩時期)、Alpha VOC(74%和66%於兩時期)及D614G(80%和74%於兩時期)。 利用隱藏馬可夫模式分析臺灣地區2021年5月至7月本土個案重覆Ct值變化可將個案分為五種狀態:低風險、中風險、高風險、極高風險和康復狀態,這五種隱狀態對應的放射高斯機率(Emission Probability)分佈之平均值分別爲45.0、34.2、29.9、23.8和15.8。其轉移機率矩陣則顯示病患在病程中不同Ct值變化傾向由低值(轉高風險)至高值(較低風險)。 從上述隱藏馬可夫模式的結果,我們進一步以Ct值15及25將病毒量分為三層,結合個案症狀發生建構四階段馬可夫模式,分析不同病毒量對症狀發生的勝算比及對潛伏時間的影響,結果發現中病毒量(15≦Ct<25)和高病毒量(Ct<15)比低病毒量(Ct≧25)的症狀發生勝算比分別為3.04(2.43 - 3.61)和10.87(1.69 - 44.90)。中病毒量和高病毒量有較短的潛伏時間。 將病毒量變化視為不同階段並估計其多階段病程,結果顯示不論在症狀前期或症狀期,疾病進展朝向高Ct水平(低病毒量)進展的速度比向低Ct水平(高病毒量)方向進展速度快。若比較不同Ct層級由症狀前期進展到症狀期的速率則發現低Ct水平的患者進展至症狀期的速率較高。一旦進入症狀期,向高Ct水平進展的速度比症狀前期階段的速率快。 結論 本論文以一系列系統性的新穎統計模型預測社區流行介入措施之效益,估計考量疫苗施打情境下的解封指數,評估遏制措施(包括NPIs、檢測和疫苗)的效益,並估計考量病毒變化之個人層級COVID-19疾病自然進展史,期望以科學的方法系統性的提供邊境管制實證效益評估和社區監測政策制定的寶貴訊息。

並列摘要


Background In the face of emerging variant of concern (VOC) on COVID-19 pandemic, the development of new epidemic modelling and approach is urgently needed. The objectives of this thesis therefore include two parts. The first part is to apply machine learning approach, combining the unsupervised and supervised methods, to assessing the influence of Non-Pharmaceutical Interventions (NPIs) and vaccine on community-acquired outbreaks given repeated surges of COVID-19 pandemic on population level; the second part is to develop a serial of stochastic process for modelling natural course of infectious process including pre-symptomatic and symptomatic phase for precision surveillance of border control and for modeling the detailed transitions of viral load level for epidemiological surveillance. Materials and Methods Open data repository in the period between 2020 and January 2022 were used for analysis. Information on demographic characteristic, symptom, and individual viral load were collected from community. The susceptible-exposed-infected-recovery model was used to estimate the effective productive number (Rt). Machine learning approach combining the unsupervised method and the supervised method was adopted to predict the spread of COVID-19. The four-state Markov model and computer simulation experiments with Bayesian underpinning was developed to model pre-symptomatic disease progression during incubation period to provide precision strategies for border control. The effect of viral-load was considered with two approaches, one regression approach and nine-state discrete-time Markov model approach. Ct-guided Markov model was applied to modelling the detailed dynamic transitions between different hidden states in relation to Ct before recovery. Results The epidemics in the globe had at least three waves of epidemics from January 1st 2021 to January 22th 2022. Fitting the observational data on Taiwan community-acquired outbreak in May 2021, over 60 % of the effectiveness by NPI and testing in the first two weeks was estimated and enhanced to over 90% after June 14 2021. Rt decreased from 4.40 to 0.29 from May 18, 2021 to July 17, 2021. Among the three supervised ML approach (SVM, logistic regression, and BN) embedded in the hierarchical supervision machine learning, BN demonstrated the superior performance in terms of AUC (87% in cluster 1 and 86% in cluster 2 for training datasets and 75% in cluster 1 and 70% in cluster 2 for validation datasets) followed by logistic regression and SVM. BN classified the global epidemic data into two clusters: the immunity dominant cluster (cluster 1) and the mitigation strategy dominant cluster (cluster 2). The overall daily rate (per 100,000) of pre-symptomatic COVID-19 cases was estimated as 109 (95% confidence interval (CI): 98-121) in D614G-1 epoch, fell to 40 (95% CI: 30-51) in D614G-2 epoch, resurged to 163 (95% CI: 141-188) in Alpha-1 epoch, declined again to 117 (95% CI: 100-135), 97 (95% CI: 77-120) and 112 (95% CI: 90-134) in Alpha-2, Delta-1, and Delta-2 epoch, respectively, when vaccine was widely administered, and resurged again to 317 (95% CI: 267-371) in the recently emerging VOC Omicron epoch. The probability of progression from pre-symptomatic to symptomatic phase in 5-day quarantine was the highest for Omicron (94%) followed by Delta (74% and 80% in two periods), Alpha VOC (74% and 66% in two periods), followed by D614G (80% and 74% in two periods). The mean of Gaussian emission distribution of Ct value for the five hidden states, namely the low risk (state 2), medium risk (state 3), high risk (state 4), extremely high risk (state 5), and the recovery status (state 1) were estimated as 45.0, 34.2, 29.9, 23.8, and 15.8, respectively. The transition probabilities between these five states indicate that dynamic changes of viral load are more likely from the low to the high low Ct level. Guided by hidden states of Ct level, the high level of viral shedding was associated with higher proportion of being symptomatic. The odds ratios of medium (15Ct<25) and high level (Ct<15) levels were 3.04 (2.43, 3.61) and 10.87 (1.69, 44.90) than low level (Ct25), respectively. Both medium and high level was associated with shorter incubation time. The estimated results show the disease progression towards high Ct level (lower viral shedding) was faster than that towards the detrimental direction in both pre-symptomatic and symptomatic phases. Patients with low Ct level were more likely to develop symptoms compared with high Ct level. Once entering the symptomatic phase, the transition rates towards higher levels of Ct became faster than their counterparts in the pre-symptomatic phase. Conclusion A series of new statistical models with a systematic approach were developed for predicting community-acquired outbreaks with interventions, estimating updated social distancing index, evaluating the effectiveness containment measures (including NPIs, testing, and vaccine), and modelling the occurrence and progression of pre-symptomatic and symptomatic COVID-19 cases in parallel with the evolution of viral shedding. All results provide valuable information for evidence-based policy-making on surveillance of border control and community.

參考文獻


Abdelrahman Z, Liu Q, Jiang S, Li M, Sun Q, Zhang Y, et al. Evaluation of the Current Therapeutic Approaches for COVID-19: A Systematic Review and a Meta-analysis. Front Pharmacol 2021;12:607408.
Adam D. A guide to R - the pandemic's misunderstood metric. Nature 2020;583:346-8.
Ahmadi A, Fadaei Y, Shirani M, Rahmani F. Modeling and forecasting trend of COVID-19 epidemic in Iran until May 13, 2020. Med J Islam Repub Iran 2020;34:27.
Appleby J. Will covid-19 vaccines be cost effective-and does it matter? BMJ. 2020 Nov 26;371:m4491. doi: 10.1136/bmj.m4491.
Arab-Mazar Z, Sah R, Rabaan AA, Dhama K, Rodriguez-Morales AJ. Mapping the incidence of the COVID-19 hotspot in Iran - Implications for Travellers. Travel Med Infect Dis 2020;34:101630.

延伸閱讀