透過您的圖書館登入
IP:3.137.170.183
  • 學位論文

機器學習與網路搜尋強度於新冠狀肺炎確診人數預測之應用

Applications of Machine Learning and Google Trends in Forecasting COVID-19 Cases

指導教授 : 白炳豐
本文將於2027/03/04開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


新冠肺炎快速地爆發造成世界各地措手不及一直蔓延至今,疫情也造成人類生活型態的改變和經濟下滑,各國的政府透過政策的規定和宣導,限制人的活動達到減少人與人的接觸傳染,藉由確診人數的預測,可以協助政府制定方針,提早地做準備。 本研究主要為美國地區,以網路搜尋強度為自變數,利用Google trend搜尋14個與新冠肺炎相關的關鍵字,而每日的確診案例為依變數,將時間錯開1至7天,固定實驗時間為2020/4/1到2020/9/30,每個資料集分別透過皮爾森相關係數篩選變數,統一縮減相關係數小的變數,分成全部變數與已經過篩選變數兩大類。使用倒傳遞神經網路、廣義迴歸神經網路、分類與回歸樹和輕量化梯度上升四種機器學習的方法,分別透過加入遺傳基因演算法搜尋訓練集中最適配的那組參數,用測試集的數據套入參數計算出預測值,將最後的結果與實際值計算平均絕對誤差、均方根誤差和平均絕對誤差三種誤差指標。將每個模型7個資料集的誤差值平均,比較四個模型的結果,在LightGBM的誤差質優於其他模型,且相對於其他模型較平穩,隨著時間錯開的影響不會誤差突然變大,篩選變數後的結果也有明顯的改善。

並列摘要


The outbreak of COVID-19 has spread rapidly in the world until now. The pandemic also changed our daily life and caused global economic recession. Therefore, the governments have announced new policy for the purpose of decreasing people contact and controlling the inflection. We can help government to make decision earlier by predicting confirmed cases. In this paper we search 14 keywords with pandemic from Google trend for the independent variable for the United Stated. And use daily confirmed cases for the dependent variable from WHO (World Health Organization). We smoothed the independent variable from 1 to 7 days with fixed confirmed cases period from 2020/4/1 to 2020/9/30. Furthermore, using Pearson correlation coefficient to filter the variables and compare each result with machine learning. The models which are Back Propagation Neural Network (BPNN), Generalized Regression Neural Network (GRNN), Classification and Regression Trees (CART) and Light Gradient Boosting Machine (LightGBM) optimized the parameter with Genetic Algorithm (GA). Thus, put it in the testing data to get the forecasting result and compare with each error. As a result, LightGBM error value is more stable than other and the best model at all.

參考文獻


一、英文部分
[1] Arora, V. S., McKee, M., & Stuckler, D. (2019). Google Trends: Opportunities and limitations in health and healthpolicy research. Health Policy, 123(3), 338-341.
[2] Atchade´, M. N., & Sokadjo, Y. M. (2021). Overview and cross-validation of COVID-19 forecasting univariate models. Alexandria Engineering Journal.
[3] Borghi, P. H., Zakordonets, O., & Teixeira, J. P. (2021). A COVID-19 time series forecasting model based on MLP ANN. Procedia Computer Science, 181, 940–947.
[4] Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied Statistics for the Behavioral Sciences. 5th ed. Boston, United States: Houghton Mifflin.

延伸閱讀