透過您的圖書館登入
IP:18.225.98.111
  • 學位論文

離群值自動檢測系統應用於時雨量資料品管

An Automated Outlier Detection System for Hourly Rainfall Data Quality Control

指導教授 : 鄭克聲
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近十年來,資料品質保證越來越受到水文領域的重視。有了品質良好的降雨資料,才能確保使用它們進行水文應用相關的風險分析及決策管理時獲得可靠的研究結果。臺灣中央氣象局管理著一個由超過600個氣象站組成的自動雨量計網路系統,每日提供即時降雨觀測。有時一個雨量站觀測到的降雨量會明顯高於或低於附近其他測站的降雨量觀測值,由於相鄰測站的降雨量往往高度相關,這可能表示異常值存在於這些觀測值中。為了控制降雨資料的品質,我們必須將這些異常值區分出來。然而,目前為止,我們缺乏明確的標準以有效地判別。 在本研究中,我們運用統計方法以建立一個自動時雨量的異常值檢測系統。首先,我們根據臺灣四種常見的降雨類型的雨季,將收集到的時雨量資料分為四組。接著利用K-Means分群法對欲研究的雨量站按其地理位置和不同的降雨特性進行分群。然後,我們分別對每一種降雨類型的每一群進行主成分分析,計算出前幾個主成分,並建立一個表示降雨量資料異常程度的指標。 一旦某個測站的降雨量觀測值符合我們定義異常的指標,我們便可以立刻找出可能發生異常值的測站。最後,我們建立了自動離群值檢測系統,並將其呈現為線上的互動式網頁。本研究的目的在於對時雨量觀測值建立一個可靠的異常值檢測系統,使我們能有效地篩選出可能發生異常值的測站,以達到時雨量資料品質控制的目標。

並列摘要


Data quality assurance has been receiving increasing attention in the field of hydrology in the last decade. Only high-quality data ensures data-driven risk analysis and decision-making strategies of hydrology applications. In Taiwan, the Central Weather Bureau manages an automated rain gauge network system of over 600 stations to obtain real-time precipitation observations. Occasionally, rainfall observations of one station are markedly higher or lower than those of nearby stations, suggesting the presence of anomalies because rainfall observations of neighboring stations are often highly correlated. To obtain reliable results based on hourly rainfall data, these anomalies should be identified in advance. However, there is a lack of definite criteria for effectively identifying anomalies. In this study, we established an automated anomaly detection system for precipitation observations. First, we categorized the data into four groups according to the four fundamental storm types in Taiwan (frontal rain, Meiyu, convective storms, and typhoons). Second, we adopted K-means clustering analysis to classify all rain gauge stations of interest by their geographical location and rainfall characteristics. For each cluster, principal component analysis was conducted to acquire the first few principal components, aiming to construct an index representing the extent of anomalies. Once the criteria are determined, identifying anomalies is straightforward. Eventually, we established the detection system and presented it as an online interactive web page. Thus, in this study, a dependable anomaly detection system was created for effectively screening out possible anomalies to achieve hourly rainfall data quality control.

參考文獻


Bock, H. (1985). On some significance tests in cluster analysis. Journal of Classification, 2(1), 77-108. doi: 10.1007/bf01908065
Bock, H. (2008). Origins and extensions of the k-means algorithm in cluster analysis. Electronic Journal for History of Probability and Statistics, 4(2), 18.
Boyle, J. S., and G. T. J. Chen, (1987). Synoptic aspects of the wintertime East Asian monsoon. Monsoon Meteorology, C. P. Chang and T. N. Krishnamurti, Eds., Oxford University Press, 125–160.
Branisavljević, N., Prodanović, D., Arsić, M., Simić, Z., Borota, J. (2009). Hydro-Meteorological Data Quality Assurance and Improvement. Journal Of The Serbian Society For Computational Mechanics, 3(1), 228-249.
Chen, C., Chen, Y. (2003). The Rainfall Characteristics of Taiwan. Monthly Weather Review, 131(7), 1323-1341. doi: 10.1175/1520-0493(2003)131<1323:trcot>2.0.co;2

延伸閱讀