運用機器學習模型分析股票的預測報酬

本文透過不同的機器學習演算法和各種類型的技術指標試圖找出股票市場中最佳的預測模型，再利用預測結果決策出相對應的投資策略。為了避免資料窺探 (Data Snooping) 所造成的推論偏差，我們採用 Hsu, Hsu, and Kuan (2008) 所提出的逐步多重 SPA 檢定法 (Step-SPA) 來檢定投資策略報酬的顯著性。根據 2019 年的標準普爾500指數 (S&P500)，我們發現在顯著水準 5 % 下，沒有任何一組投資策略具有顯著的正報酬。當放寬顯著水準為 10 % 後，便能找到一組投資策略具有顯著正報酬，該策略為一組隨機森林其透過 K 折交叉驗證方式篩選每次隨機抽樣所需的變數個數，再透過該模型生成預測報酬率，若預測報酬率超過0.001則買進或低於–0.001則賣出，再無條件重複該交易訊號延長2天，透過上述所形成的投資策略。

關鍵字

逐步多重SPA檢定法；資料窺探；機器學習；技術分析；標準普爾500指數

並列摘要

In this paper, we search for the best prediction model through different machine learning algorithms and various types of technical indicators, and use the prediction results to select the corresponding investment strategy. In order to avoid inference bias caused by data snooping, the Step-SPA method proposed by Hsu, Hsu, and Kuan (2008) is used to identify the investment strategies with significantly positive returns. The empirical results show that, for the S&P500 index in 2019, there is no strategy has a positive return at a significance level of 5 %. When the significance level is relaxed to 10 %, only one group of investment strategies has a significant positive return. The strategy is the random forest with the number of variables randomly sampled as candidates at each split selected by the K-fold cross-validation. Then through the model to generate a predicted rate of return, if the forecast rate of return exceeds 0.001 buy or below –0.001 then sell, and then unconditionally repeat the transaction signal extended for 2 days, through the investment strategy formed above.