透過您的圖書館登入
IP:3.141.41.187
  • 期刊

迴歸模型偵測歧異點之統計方法於氣溫資料校驗的探討

Evaluation of Outlier Detection Algorithms in Linear Regression for Temperature Validation

摘要


歧異點偵測爲資料品質管制中重要的一環,氣象的資料校驗對未來準確建構預報系統,以及其他相關產業的應用有重大影響。本文針對由迴歸模型架構下偵測歧異點的概念,設定模型的反應變數與解釋變數分別爲欲校驗判斷歧異點的變項和其參考值,就頻率與貝氏統計學派常用的方法進行研究,探究各種方法於氣溫校驗的適用性,包含殘差法、配適差異法、Cook距離法及貝氏學派的預測分配不一致檢定和隨機誤差配適迴歸模型前後機率比較等方法。頻率學派的student化殘差與student化去點殘差能有效地偵測由系統誤差造成的歧異資料,而配適差異與Cook距離這兩個指標因多考慮了解釋變數的訊息,導致易挑出因極端氣候造成的資料點,但這些資料點僅因發生機會較少但並未偏離迴歸關係線;貝氏學派的檢測方法,雖可綜合現有偵測資料集與歷史資料集的訊息,但須考慮兩資料集的趨勢情況,以更適切地挑選出歧異點。本研究將提供統計領域相關人員簡易了解統計於氣象校驗上的應用,也提供氣象領域人員挑選校驗氣象資料適當統計模型的参考。

並列摘要


Data verification is a critical process to reflect factual information. Meteorological data validation, especially detecting erroneous data points, makes a huge impact on the accurate forecasting, as well as the application of other linked industries. The linear regression model that compared the relationship between verified observation as response and references as an explanatory variable is generally adopted in practical temperature validation. In this study, statistical methods for outlier detection via regression model are evaluated using simulation and real data analysis, including four Frequentist algorithms and two Bayesian approaches. For Frequentist approaches, DFFITS and Cook's distance are less appropriate than studentized and studentized deleted residuals because the data points resulted from extreme climatic rather than false observations are easy to detect. Moreover, Bayesian predictive discordancy test and random error probabilities comparison can synthesize the information of existing detection data sets and historical data sets, but it is necessary to consider the trend of the two data sets to more appropriately identify outliers. This study provides an easy understanding of Statisticians on the application of meteorological verification, as well as a reference for the selection of appropriate statistical models to calibrate of meteorological data by Meteorologists.

參考文獻


馮豐隆 、高堅泰 (1999)。 應用克利金推估模式於降雨製圖。 台大實驗林研究報告, 13, 155-163。
薛宏宇 、呂致穎 、陳翠玲 (2016)。 應用數值預報模式增強氣溫觀測資料偵錯研判分析。 氣象科技研究中心中央氣象局。
Adikaram, K., Hussein, M., Effenberger, M. and Becker, T. (2014). Outlier detection method in linear regression based on sum of arithmetic progression. The Scientific World Journal 2014.
Aggarwal, C.C. (2015). Outlier analysis. Springer.
Aggarwal, C.C. and Yu, P.S. (2001). Outlier detection for high dimensional data. Proceedings of the ACM SIGMOD Conference 2001, 37-46.

延伸閱讀