運用自動編碼器與主成分分析結合決策樹預測台積電股價

當今科技進步日新月異，2016年AlphaGo擊敗世界棋王，人工智慧（Artificial Intelligence）再度引發話題，人工智慧如果可以不受人類的影響進行自我學習，其在做決策時就不會帶有偏見及情緒，可以更客觀地進行決策。從人工智慧中延伸出機器學習（Machine Learning）與深度學習（Deep Learning）席捲全球帶起一股熱潮。台積電在台股加權指數中占21.29%，故本研究將對其股價進行預測。研究期間為2009年至2018年止共10年時間，分割訓練樣本（2009年至2017年止）及測試樣本（2018年1月至12月止），使用機器學習-CART決策樹、深度學習-自動編碼器與主成分分析；變數上選用一般投資大眾使用之移動平均線、隨機指標、相對強弱勢指標與指數平滑異同平均線等四種技術指標加上三大法人買賣超；探討有無降維處理對準確率（Accuracy）之影響。研究結果顯示CART決策樹、AE-CART及PCA-CART三種方法中以7個變數之PCA-CART預測準確率最佳達77.73%。證明經過適當降維有助於提升準確率。自動編碼器分成AE-CART與PCA-AE-CART兩種方法比較3種不同Activation Function（Relu、Tanh 及Sigmoid），藉此觀察三種函數之預測準確率。研究結果顯示以PCA-AE-CART之Tanh預測準確率最佳達66.80%。有使用主成分分析先進行降維處理比單獨使用自動編碼器結合決策樹進行預測其準確率有明顯提升。

關鍵字

決策樹；自動編碼器；主成分分析；預測股價

並列摘要

Nowadays, technology improves rapidly every day. AlphaGo defeated the world Go King in 2016. Artificial Intelligence (AI), which aroused the interest again. If AI can learn by itself without influencing by human, AI wont be biased with emotions and makes more objective decision. Therefore, both of Machine Learning and Deep Learning are extended from AI, which swept the world. TSMC accounts for 21.29% of Taiwan Stock Exchange Capitalization Weighted Stock Index. Therefore, this study intended to predict its stock price using three different methods. This experiment data was collected for a period of 10 years from 2009 to 2018, which included the training samples (from2009 to 2017) and testing samples (2018 only). In this study, the CART decision tree, autoencoder and the principal component analysis are conducted to evaluate their prediction performance based on confusion matrix. In the variables, the four general technical indicators, such as the Moving Average, the stochastic KD, the Relative Strength Index and the Moving Average Convergence and Divergence are used by the general public and extend Institutional investors net buy or net sell. The results show that among the three methods of the CART decision tree, AE-CART and PCA-CART, the best prediction accuracy of PCA-CART with 7 variables is 77.73%. It proves that proper dimensionality reduction can improve accuracy. The autoencoder is categorized into two methods, AE-CART and PCA-AE-CART for analyzing three different Activation Functions (Relu, Tanh and Sigmoid), which observes the prediction accuracy of the three functions. The results show that the accuracy of Tanh prediction by PCA-AE-CART is up to 66.80%. Compared to the use of the autoencoder combined with the decision tree, using the principal component analysis to perform the dimensionality reduction process improves the accuracy more significantly.