應用R語言套件於空氣汙染之分散式時空資料分析

近年來，由於R包數據分析和可視化的成熟軟件包，包括空氣污染分析，R包成為大數據分析的流行工具。空氣污染問題日益受到全球關注，因為它對環境和人類健康有很大的影響。隨著物聯網的快速發展和傳感器收集的地理信息的準確性的提高，產生了大量的空氣污染數據。因此，由於存儲器設計的固有特性，難以有效可靠地分析單機環境中的空氣污染數據。在這項工作中，我們構建了基於RHadoop和SparkR軟件的分佈式計算環境，以更可靠，有效地進行空氣污染分析和可視化。在工作中，我們首先使用稱為EdiGreen AirBox的傳感器來收集台中的空氣污染數據。然後，我們採用距離反比權重法（IDW）方法將傳感器的數據轉換為密度圖。最後，實驗結果表明，利用ARIMA模型對PM 2.5短期預測結果的準確性進行了分析。另外，驗證關於MAPE方法的預測精度也在實驗結果中給出。

關鍵字

空汙分析； RHadoop ； SparkR ；空汙分析；距離反比權重法

並列摘要

Recently, the R package becomes a popular tool for big data analysis due to its several matured software packages for the data analysis and visualization, including the analysis of air pollution. The air pollution problem is of increasing global concern as it has greatly impacts on the environment and human health. With the rapid development of IoT and the increasing of the accuracy of geographical information collected by 感測器s, a huge amount of air pollution data were generated. Thus, it is difficult to analyze the air pollution data in a single machine environment effectively and reliably due to its inherent characteristic of memory design. In this work, we construct a distributed computing environment based on both the softwares of RHadoop and SparkR for performing the analysis and visualization of air pollution with the R more reliably and effectively. In the work, we firstly use the 感測器s, called EdiGreen AirBox to collect the air pollution data in Taichung. Then, we adopt the Inverse Distance Weighting (IDW) method to transform the 感測器s' data into the density map. Finally, the experimental results show the accuracy of the short-term prediction results of PM 2.5 by using the ARIMA model. In addition, the verification with respect to the prediction accuracy with the MAPE methodis also presented in the experimental results.

並列關鍵字

Air pollution data analysis ； IDW ； RHadoop ； SparkR ； R package

國際替代計量

應用R語言套件於空氣汙染之分散式時空資料分析

全文下載

主題瀏覽