時間序列是隨著時間的推移所產生的資料組,現今的許多領域中皆有使用類神經網路(Artificial Neural Networks, ANNs)進行時間序列預測未來的發展趨勢,例如:金融、旅遊等,於是如何建構一個快速且高準確度的預測模型成了一個重要的議題。類神經網路能夠在資料分布不確定的情況下進行預測,而類神經網路中的長短期記憶網路(Long Short-Term Memory, LSTM)由於擁有記憶單元,能夠將先前序列的關鍵資訊記憶並延續,也能減少網路的訓練時間。 本論文所使用的樣本為2004年時間序列比賽所提出的CATS(Competition on Artificial Time Series)時間序列資料,近幾年的相關文獻中仍有使用CATS資料組進行預測之驗證,但預測結果顯示均方誤差(Mean Square Error, MSE)為170。本研究期望透過建構LSTM網路來改善預測準確度,並以CATS資料進行誤差驗證。在CATS資料組中共分成5個區間,每區間共有1000筆資料點,比賽規則為在5個區間中各移除20筆資料點,剩餘的4900筆資料點將作為長短期記憶網路的訓練與驗證之用,最後再使用網路對移除的資料點進行預測。 本研究分別建構單層與雙層LSTM網路架構對被移除的100筆資料點進行預測,透過網路中的LSTM層找出資料中的特徵與未來趨勢,以利提升預測準確度。研究結果指出在固定的隨機變數下,透過雙層LSTM網路架構所預測出的100筆資料點之均方誤差為4.34較單層LSTM網路架構預測出的誤差59.49低,結果顯示使用雙層LSTM網路架構可增加預測準確度,但需要計算的參數相較於單層LSTM網路架構多1倍,也需要多花費2.5倍的時間進行運算。未來期望結合多種類神經網路結構,對巨量的時間序列資料進行預測並提供良好的準確度。
Time series is a data set generated by time. Artificial neural networks are used in many areas to predict the future trends of time series nowadays, e.g., finance and tourism, so how to construct a fast and accurate prediction model becomes an important issue. When data distribution is uncertain, using neural network can still predict. Because Long Short-Term Memory network (LSTM) has the memory unit, can remember and continue the key information of the previous sequence, at the same time, it can also reduce the network training time. The sample used in this paper is the Competition on Artificial Time Series (CATS) data proposed in the 2004 time series competition. There are some recent researches use CATS data set to verify prediction accuracy, but the prediction Mean Square Error is 170. The purpose of the research was to construct a LSTM network to improve the prediction error of CATS time series. In the CATS data set, there are 5 intervals with 1000 data points. In the prediction competition, 20 data points will be removed from each interval. The remaining data points will be used as training and testing for LSTM network. Finally, the network will predict the removed data points. This study constructs a single layer and double layer LSTM network to predict the removed data points, and finds the characteristics and future trends in the data through the LSTM layer to improve the prediction accuracy. The result show that under fixed random variables, the MSE of the double layer network is 4.34 lower than 59.49 predicted by single layer. Using double layer LSTM network can increase the prediction accuracy, but the parameters need to be calculated are double than single layer, and it takes two and a half times the time during the operation. In the future, it is expected to combine a variety of neural network structures to predict a large time series data and provide good accuracy for it.