透過您的圖書館登入
IP:3.21.104.109
  • 學位論文

以社群網路分析進行流感趨勢預測

Flu Trend Prediction based on Network Community Analysis

指導教授 : 張昭憲

摘要


流行性感冒每年對全球民眾帶來嚴重的健康威脅,根據WHO統計,全世界流感年度流行造成約300萬嚴重病例及約25萬人死亡。針對流感的威脅,需提早預防,才能有效控制疫情的擴展。為監測流感疫情,各國疾管局通常藉由臨床就診通報來彙整資料,可能產生一至二週的延遲,對於流感這類型快速傳播的疾病顯然緩不濟急。為提供有效的流感就診率預測,本研究蒐集Twitter社群之發言與Google熱門關鍵字搜尋資料,配合官方提供的實際流感就診率,分別建立線性預測模型。此外,考量流感疫情可能會因潛伏期造成的延遲,亦將延遲因素納入,以降低預測延遲的產生。為進一步提升預測準確率,本研究嘗試採用模型融合概念,將多種模型的預測結果加以組合,以提升預測的穩定性。實驗結果顯示,在回溯週數為4週的前提下,Twitter模型相關度達0.87,Google關鍵字搜尋熱度模型相關度亦可達0.78。當考量延遲因子時,則以Google關鍵字熱度延遲模型之關聯度最高(0.868)。對於模型融合,當以前一年資料進行塑模時,後一年之預測相關度亦可達0.84。上述結果顯示,本研究利用社群網路資料建立之預測模型,確能補足官方數據延遲之缺陷,提供可接受之預測準確率。

並列摘要


Every year, influenza (flu) threats to the health of the world’s population. According to the World Health Organisation (WHO) statistics, annual epidemics of influenza caused about 3,000,000 severe cases and killed around 250,000 people. For the threat of influenza, early prevention is necessary to effectively controlling the expansion of epidemic.To monitor flu epidemic situation, Center of disease control in every country usually integrates information with clinical diagnosis, which may possibly delayed for one to two weeks, apparently it is not fast enough to limit the rapid spread of influenza. In order to provide timely and accurate forecast for consultation rate, this study will utilise data from Twitter and Google Keyword Research, and established two sets of linear regression based model by combining the actual data from WHO statistics. Moreover, taking into account of incubation period, time factor will also be included to reduce the impact of delayed. To improve the measurement, this study fusing multi-models to achieve a better predictive result. Assume the backtrace period is four weeks, the actual relative value for prediction model of Twitter is 0.87 while the model of Google Keyword Model is 0.78. If time delayed is included, the actual relative value of Google Keyword Delayed Model is 0.868 which is the best among others. For the fusion of multi-models, the actual relative value can reach to 0.84 by using the data from the previous year. Based on the above results, this study can establish prediction model based on network community data and provide acceptable accuracy to fill up the lack of official data latency.

參考文獻


4. Batuhan Bardak,Mehmet Tan.” Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data”. Bioinformatics and Bioengineering (BIBE), 2015 IEEE 15th International Conference.
5. Kenny Byrd, Alisher Mansurov,Olga Baysal. “Mining Twitter Data for Influenza Detection and Surveillance”. 2016 IEEE/ACM International Workshop.
6. Sangeeta Grover,Gagangeet Singh Aujla. “Twitter data based prediction model for influenza epidemic”. 2015 2nd International Conference:879-879.
7. Vasileios Lampos,Nello Cristianini.” Tracking the flu pandemic by monitoring the Social Web”. 2010 2nd International Workshop:411-416.
12. Hulth, A., Rydevik, G. & Linde, A., “Web Queries as a Source for Syndromic Surveillance”, PLoS ONE 4(2): e4378. doi:10.1371/journal.pone.0004378 (2009).

延伸閱讀