透過您的圖書館登入
IP:3.133.12.172
  • 學位論文

應用機器學習方法預測PM2.5—以大台北地區為例

Application of the Machine Learning Method in PM2.5 Prediction: A Case Study of Taipei Area

指導教授 : 林靖愉
共同指導教授 : 郭育良(Yue-Liang Guo)
本文將於2026/03/23開放下載。若您希望在開放下載時收到通知,可將文章加入收藏

摘要


背景與目的 PM2.5細懸浮微粒係指氣動粒徑小於2.5微米的粒子,依據不同的成分組成及附著物具有不同的毒性,其孔徑大小足以穿透肺泡至人體血液中,尤其對於長者、幼兒、具有心肺功能疾病者,不論長短期,暴露於細懸浮微粒都具有對危害健康的潛在風險。環保署於1993年完成全國空氣品質監測站網的設置,以達到監督空氣品質保障人民健康之目的,而近年來隨著空氣品質受到民眾重視,也越來越多研究嘗試對空氣汙染進行預測。本研究旨在利用不同機器學習模型比較空氣品質預測效力。 材料與方法 本研究針對環保署設立於台北地區以台灣新北市及台北市為主的一般空氣品質測站,蒐集2018年至2019年間包含PM2.5細懸浮微粒等空氣污染物以及其相關的氣象資料,以過去8小時之歷史資料推估未來三小時後之PM2.5細懸浮微粒濃度,使用2018年對模型進行訓練,並用2019年的資料進行驗證,以評估模型的效果。模型部分使用傳統線性回歸統計方法作為基準,比較機器學習與深度學習模型對於PM2.5細懸浮微粒濃度預測的效力。研究中探討單一模型對於不同測站間、不同模型間的預測效果比較,並考量加入鄰近測站的影響,評估其對不同模型的預測效果提升是否有幫助。 結果 本研究共蒐集納入兩年間13測站共227760筆逐時資料,23個變數,各測站的PM2.5濃度平均為15.23毫克 (標準差為10.15毫克),使用三種模型進行預測,發現以XGBoost預測模型效力最高,其次是LSTM,兩者平均都高於線性回歸模型。在測站方面,以土城及菜寮站在R-squared上表現最好,士林及萬華站表現最差,而加入鄰近測站變項後,比較無納入變項的土城站、士林站及萬華站的預測效果都有所提升,最終的模型對於2019年整年的預測達到了64%的預測力。 結論 本研究顯示在提前三小時的預測力上XGBoost預測模型相較於神經網路及線性回歸具有較佳的預測效果,加入鄰近測站也能提高模型的準確率。

並列摘要


Backgrounds: Increasing air pollution has become a grave concern, with researchers finding adverse health effects caused by air pollutants. Among all air pollutants, fine particulate matter (PM2.5) whose aerodynamic diameter is less than 2.5μm is of particular concern. Especially for sensitive people, short term as well as long-term exposure to PM2.5 might bring serious hazards. Although the Taiwan Environment Protection Agency has built an air quality monitoring network to monitor the PM2.5 concentrations and the government has revised the standards related to pollutants, an accurate and prompt early warning system is urgently needed. Methods: In this study, we conducted a comprehensive evaluation of several models to predict PM2.5 concentrations in the Taipei area. We collected the data of Taipei City and New Taipei City from 2018 to 2019 from the Environmental Protection Agency open data platform, and we applied three kinds of models, i.e., linear regression, machine learning, and deep learning after a series of data preprocessing steps. Depending on the various requirements of models, the dataset can be classified as time-series-oriented and feature-oriented to fit the model. Model performance among stations and different models are compared in our research. We also compared using geographical predictors using nearby stations to see whether they would improve the predictions. The performance of prediction was evaluated using Root Mean Square Error, Mean Absolute Error and R-squared. Results: In this study, 227760 hourly data from 13 stations were collected, and 23 variables were adopted to train the model. Among all stations, the XGBoost model outperformed the LSTM model followed by the linear regression model. Tucheng and Cailiao station in all the three models achieved the best R-squared on average (0.6043, 0.6042 respectively). By additionally considering the influence of nearby stations, most models improved their predictions. Finally, the best models’ prediction reached an R-squared value of 0.64. Conclusion: This study found that the prediction using the 2018-year data in a single station in the Taipei Area can have a performance of 0.64 by using the XGBoost model, which outperformed the LSTM model followed by the linear regression model. Additional features from nearby stations for training are also beneficial to the predictions.

參考文獻


Bai, X., Liu, Y., Wang, S., Liu, C., Liu, F., Su, G., . . . Yan, B. (2018). Ultrafine particle libraries for exploring mechanisms of PM2.5-induced toxicity in human cells. Ecotoxicol Environ Saf, 157, 380-387. doi:10.1016/j.ecoenv.2018.03.095
Beckerman, B. S., Jerrett, M., Serre, M., Martin, R. V., Lee, S. J., van Donkelaar, A., . . . Burnett, R. T. (2013). A hybrid approach to estimating national scale spatiotemporal variability of PM2.5 in the contiguous United States. Environ Sci Technol, 47(13), 7233-7241. doi:10.1021/es400039u
Chang, L. T., Chuang, K. J., Yang, W. T., Wang, V. S., Chuang, H. C., Bao, B. Y., . . . Chang, T. Y. (2015). Short-term exposure to noise, fine particulate matter and nitrogen oxides on ambulatory blood pressure: A repeated-measure study. Environ Res, 140, 634-640. doi:10.1016/j.envres.2015.06.004
Chen, T., Guestrin, C. (2016). XGBoost. Paper presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Chen, Z.-Y., Zhang, T.-H., Zhang, R., Zhu, Z.-M., Ou, C.-Q., Guo, Y. (2018). Estimating PM2.5 concentrations based on nonlinear exposure-lag-response associations with aerosol optical depth and meteorological measures. Atmospheric Environment, 173, 30-37. doi:10.1016/j.atmosenv.2017.10.055

延伸閱讀