透過您的圖書館登入
IP:18.117.114.211
  • 學位論文

台灣新冠肺炎網路新聞報導之大數據分析—以某群聚感染事件為例

A Big Data Analysis of Online Journalism of COVID-19 in Taiwan—Taking a Cluster Infection Event as an Example

指導教授 : 陳秀熙

摘要


目的: 新冠肺炎世界大流行疫情發展至今,已引起全球各地以及台灣對此一新興傳染病之密切關注與廣泛報導。然而目前台灣少有針對此一新興傳染病新聞報導發展之相關研究。本研究以台灣疫情初期新冠肺炎群聚感染事件—彰化「白牌車司機染疫」為主要分析對象,運用專業新聞主題以及文本探勘方法建構網路新聞報導新興流行疾病事件的關聯結構與脈絡,並運用時間相關分析方法評估報導時序。 方法: 本研究以台灣首起因COVID-19死亡的本土個案,「白牌車司機染疫」群聚事件為研究對象。由網路平台搜集2020/02/16至2020/05/15此事件之新聞報導標題與文本。共計納入254則新聞文本並建立此感染事件大數據分析資料庫。本研究運用媒體專業素養以內容分析法萃取報導主題與建構關聯網絡,同時也運用大數據文字探勘特異矩陣方法萃取主題。對於所萃取之主題分別運用無母數與半母數迴歸方法以及隨機森林方法評估其時序性;並就主題萃取以及時序建構之不同方法結果進行評估與比較。 結果: 本研究納入之254則網路新聞,分別為37家媒體所報導,再依其媒體特性區分為五類,分別為報紙(129則)、網路媒體(80則)、電視台(31則)、雜誌(10則)、廣播(4則)。經由內容分析法檢視254則新聞後,共計整理出59個關鍵主題,在新聞中出現則數最高的前五名分別是:白牌車司機、浙江台商、台灣首例死亡、感染源和抗體檢測。另由大數據文字探勘方法萃取6個主題後即可解釋相當大程度之文本文詞句變異,排名前5的主題分別為:白牌車司機社區群聚事件、計程車防疫SOP、台灣首例死亡與遺體處理、浙江台商感染源、白牌計程車管理。運用大數據文字探勘可有效萃取與媒體專業素養之內容分析一致性高的主題。 運用無母數時間相關事件分析評估各個主題報導時序關係發現,對於此群聚事件報導多數主題於10天內結束。寇斯迴歸分析結果發現,白牌車司機(HR 0.65)、浙江台商(HR 0.57)、抗體檢測(HR 0.57)、感染源(HR 0.47)、散布未查證訊息(HR 0.47)、核酸檢測(HR 0.32)、大數據(HR 0.38)皆為具有持續性之報導主題。隨機存活森林分析結果可發現相對權重最高之五大主題分為感染源、CALL IN、核酸檢測、取締白牌車、美國CDC。 結論: 本研究分別運用媒體專業以及大數據之文本探勘方法,以質性和量性方式建立新冠肺炎網路新聞報導主題脈絡,並延伸至主題時序相關分析,可作為評估與監測媒體報導新興傳染病的架構與方法。

並列摘要


Objective:The outbreak of COVID-19 pandemic has drawn much attention of the media around the world. However, there are few studies focusing on the evolution of media reporting on this emerging infectious disease (EID) in Taiwan. We aim to explore the qualitative and quantitative aspects of the media reporting on COVID-19 outbreak by using the news on the first clustered infection related to a driver in Changhua, Taiwan. Stemming from this exploration, we further construct the framework on elucidating the context of media reporting for EID by using the network of the qualitative themes and the temporal sequences. Methods:The materials of this study were the news on the first clustered infection of COVID-19 occurred in Changhua, Taiwan with the first news reported on February 16, 2020. A web-based searching for relevant news reported between February 16 and May 15, 2020 were performed. On the basis of this material, a digitalized database on the reporting of this clustered infection was constructed. The themes on the collected news were first abstracted by using the content analysis approach based on the media profession. A text mining approach was also utilized to extract the themes in the news. The network between the themes was established based the results of association analysis. Regarding the temporal sequences of the themes, the time-to-evet analysis by using the conventional approaches of non-parametric and semi-parametric methods and the machine learning approach of random survival forest were applied. The consistency of the qualitative results and the quantitative results derived from conventional methods and machine learning approaches was assessed. Results:A total of 254 news texts which were extracted from 37 medias covering 5 types, namely newspaper (129 news), web-based media (80 news), television (31 news), magazine (10 news), and broadcast (4 news), were retrieved. The collected news were categorized into 59 themes by using the content analysis approach. The top five themes were Pak Pai driver (白牌車司機), businessman from Zhejiang (浙江台商), first fatal case in Taiwan (台灣首例死亡), index case (感染源), and serological test for antibody (抗體檢測). The results of text mining approach show that the news reporting on the clustered event can be captured by 6 themes. The top five themes abstracted by the text mining approach were clustered event associated with Pak Pai driver (白牌車司機社區群聚事件), procedures for preventing the transmission of disease for taxi drivers (計程車防疫SOP), first fatal case in Taiwan and the disposal for corpse (台灣首例死亡、遺體處理), businessman from Zhejiang as the index case for the clustered event (浙江台商感染源), and management for Pak Pai (白牌計程車管理). The abstracted themes for the news reporting on the event were consistent by using the two approaches. The non-parametric time-to-event analysis shows that most of the reporting of the event appeared in the first ten days. The hazard ratios for the continuous reported theme were estimated as 0.65 (Pak Pai driver (白牌車司機)), 0.57 (businessman from Zhejiang (浙江台商)), 0.57 (serological test for antibody (抗體檢測)), 0.47 (index case(感染源)), 0.47 (spread false news(散布未查證訊息)), 0.32 (PCR test (核酸檢測)), 0.38 (big data(大數據)). The top five rank of the weight of themes derived by using random forest analysis were index case (感染源), CALL IN, PCR test (核酸檢測), ban on Pak Pai (取締白牌車), and US CDC (美國CDC). Conclusion:By using the conventional and machine learning approaches for big data analytics, the qualitative themes and quantitative temporal sequences were constructed with the network between themes established for the news reported on a clustered infection of COVID-19 in Changhua, Taiwan. The proposed novel approaches on the basis of big data analytics provides a solution for elucidating and monitoring the context of reporting for EID.

參考文獻


英文部分
Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., Shah, Z. (2020). Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study. Journal of Medical Internet Research, 22(4), 9. doi:10.2196/19016
Chiang, W.-C., Lin, Y.-L., Yu, L.-C., Chang, Y.-H., Chen, Y.-A., Wang, F.-C., . . . Lin, K.-C. (2019). Application of text mining in the public perception analysis of global budget payment and National Health Insurance systems. Taiwan Gong Gong Wei Sheng Za Zhi, 38(2), 189-202.
Davies, N. G., Abbott, S., Barnard, R. C., Jarvis, C. I., Kucharski, A. J., Munday, J. D., . . . Edmunds, W. J. (2021). Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science, 372(6538), eabg3055. doi:10.1126/science.abg3055
Davis, M. A., Zheng, K., Liu, Y., Levy, H. (2017). Public response to Obamacare on Twitter. Journal of Medical Internet Research, 19(5), e167.

延伸閱讀