對交通資料之混合式預測 演算法

有許多類型的數據可以被視為作為時間序列。因此，時間序列的預測被應用在各種不同的領域，如投資，交通預測等，交通狀態預測可用於擁塞避免和旅遊規劃。我們要解決的問題是利用時間序列預測來對交通狀況進行預測。時間序列的預測問題定義如下，給予一查詢時間，及時間序列數據，預測出在查詢時間的值。通常情況下，查詢時間為未來的時間。在這篇論文中，我們提出了一個混合式預測算法，同時利用基於回歸(regression-based)和基於分群(clustering-bsaed)的預測方法。明顯地，基於回歸的預測在預測時間離目前時間不會太遠時，有較準確的預測結果，而基於分群的預測則在預測時間較遠時有較準確的預測結果。我們觀察到在時間序列中有一些相似形狀或趨勢。為了捕捉這些形狀，我們利用了分群的概念。從這些叢集中，我們可以進一步發現他們在時序上的關係。因此，如果查詢時間距離目前時間較遠，我們利用上述叢集的時序關係，預測可能出的叢集。再從可能出現的叢集中，預測在查詢時間點上的數據值。在這邊需要注意的是混合了上述兩種方法的混合式演算法使用一個閾值來決定使用哪種方法。如果查詢時間和當前時間之間的時間差小於閾值，混合預測演算法使用基於回歸預測。反之，則使用基於分群的預測。為了驗證我們提出的方法，我們進行了大量對真實數據的實驗。並經由實驗結果證明我們所提出的方法既準確又實用。

關鍵字

資料探勘；時序資料；預測

並列摘要

Many types of data can be regarded as time series data. Therefore time series data predictions are applied in a wide range of domains, such as investment, traffic prediction, etc. Traffic status prediction can be used for congestion avoidance and travel planning. We solve the problem of predicting traffic status by time series prediction. The time series data prediction problem is that given a query time and time series data, we intend to predict the data value at the query time. Usually, a query time will be a future time. In this paper, we propose a hybrid prediction algorithm which exploits regression-based and clustering-based prediction methods. Explicitly, regression-based prediction is accurate when the query time is not too far from the current time. Note that time series data may have some similar shapes or trends. To capture the similar shapes hidden in this data, we utilize clustering concepts. Using these clusters, we could further discover their sequential relationships. As such, if the query time is far away from the current time, we utilize the above cluster sequential relationships to predict the possible similar cluster. From the similar cluster, the data value at the query time is obtained. Note that the hybrid algorithm aggregates the above two methods using one threshold that decides which method to use. If the time difference between the query time and the current time is smaller than the prediction length threshold, hybrid prediction uses regression-based prediction. Otherwise, our hybrid algorithm uses clustering-based prediction. To prove our proposed methods, we have carried out a set of experiments on real data sets to compare the accuracy of the methods. The results of the experiments prove that our proposed methods are both accurate and practical.

並列關鍵字

data mining ； time series ； prediction

參考文獻

[2] Chris Chatfield. Time-Series Forecasting. Chapman and Hall/CRC, 2001.

[3] L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, 2004.

[4] L. Chen, M. T. Ozsu, , and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491 – 502, 2005.

[5] Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 1996.

[6] C.L. Giles, S. Lawrence, and A.C. Tsoi. Noisy time series prediction using recurrent neural networks and grammatical inference. In Machine Learning, 2001.

國際替代計量

對交通資料之混合式預測演算法

全文下載

主題瀏覽