透過您的圖書館登入
IP:13.58.117.102
  • 學位論文

集成學習於比特幣短期價格走勢預測

Ensemble Learning for Short-term Bitcoin Price Trend Prediction

指導教授 : 呂育道
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,加密貨幣市場蓬勃發展,比特幣(Bitcoin)作為其中的代表性資產,吸引了大量投資者和研究者的關注。隨著市場的波動性和不確定性增加,準確預測比特幣價格走勢變得愈加重要。然而,大多數現有研究都聚焦在傳統金融交易市場的數據,研究往往聚焦在使用日交易數據進行預測,對於快速變動的加密貨幣市場這可能無法及時提供有益於制定交易策略的訊息。 本論文著重在使用集成學習(ensemble learning)策略提升單一模型在比特幣短期價格走勢預測的準確性,透過堆疊法(stacking)讓集成模型能綜合多個單一模型的優勢,驗證集成模型提升各別單一模型在準確率(accuracy)及精確度(precision)上的表現,同時探討集成模型在召回率(recall)及F1分數(F1 score)上的表現。本論文使用兩個分別基於訓練集收盤價漲跌比例及測試集收盤價漲跌比例的隨機選擇模型(random prediction model),使用測試集收盤價漲跌比例代表看到部分未來漲跌資訊。實驗結果顯示,在透過滾動窗口驗證法(rolling window validation)及多個窗口驗證後,所有單一模型在準確率及精確度上皆優於兩個隨機選擇模型,代表單一模型具備優於隨機選擇的漲跌預測能力。集成模型在結合多個單一模型後,在準確率及精確度上皆在最多個窗口下表現最佳,且集成模型同時在平均準確率及平均精確度表現皆最好:在平均準確率上比單一模型中表現最佳的 LSTM 高,且在平均精確度上比LSTM高約1.7%。在平均精確度上與所有單一模型中表現最佳的SVM幾乎相同,然而在平均準確率上比SVM高約1.2%。集成模型在準確率及精確度上的表現證明了集成學習能夠提升單一模型在比特幣短期價格走勢預測上的準確性,得到一個同時在準確率及精確度上表現皆最好的模型。 為了更全面地觀察模型表現,我們計算召回率及F1分數。召回率反應模型在所有真正為正例的樣本中,能夠找出多少正例。F1分數則提供一個全面評估模型正例預測表現的評估指標,使我們能透過單一評估指標觀察模型正例預測表現。因為召回率只反應模型在所有正例中預測成功的比率,而F1分數則是同時考慮了召回率及精確度,所以這兩個評估指標反應模型在準確性以外的表現。 在召回率及F1分數上集成模型並未取得最高的表現,在平均召回率及F1分數上均第三高,比平均召回率表現最好的LSTM低約15%,也比平均F1分數表現最好的LSTM低約6%。在四個單一模型中,平均召回率及平均F1分數最高與最低的模型相差很大,因此集成模型在整合所有單一模型的預測結果後,在這兩個評估指標上會受到表現高及表現低的模型影響,使這兩個評估指標的表現介於最高表現與最低表現之間。

並列摘要


In recent years, the cryptocurrency market has experienced significant growth with Bitcoin as its representative asset. Thus, it attracts considerable attention from investors and researchers. As market volatility and uncertainty increase, accurately predicting Bitcoin price trends becomes important. However, most existing research has focused on data from traditional financial trading markets, often relying on daily trading data for prediction. This may not provide timely trading information for the rapidly changing cryptocurrency market. This thesis focuses on enhancing the performance of single models in predicting short-term Bitcoin price trends by using an ensemble learning strategy. Through stacking, the ensemble model integrates the advantages of multiple single models to improve prediction accuracy and precision. This thesis also examines the effects of ensemble learning on recall and F1 score. Experimental results demonstrate that every single model outperforms two baseline random selection models: one based on the proportions of upward movements in the training set as prediction probability and the other based on those in the testing set. The ensemble model achieves the highest accuracy and precision across more window sizes than any single model. It also achieves the highest average accuracy and average precision. So ensemble learning improves prediction accuracy and precision. The ensemble model does not achieve the highest average recall and average F1 score. It ranks third in the average recall and the average F1 score. Its average recall is 15% lower than LSTM, which achieves the highest average recall; its average F1 score is 6% lower than LSTM, which achieves the highest average F1 score.

參考文獻


Y. Bengio, P. Frasconi, and P. Simard. The problem of learning long-term dependencies in recurrent networks. In IEEE International Conference on Neural Networks, volume 3, pages 1183–1188, San Francisco, 1993.
Binance. Binance/binance-public-data: Details on how to get binance public data. https://github.com/binance/binance-public-data, 2024. (accessed April 14,2024).
T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31:307–327, 1986.
G. E. P. Box and G. M. Jenkins. Time Series Analysis: Forecasting and Control. Holden-Day, San Francisco, 1976.
L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.

延伸閱讀