透過您的圖書館登入
IP:3.146.221.52
  • 學位論文

可擴展式基於文字分析之股票趨勢預測系統

Scalable System for Textual Analysis based Stock Market Prediction

指導教授 : 蘇雅韻

摘要


股市趨勢預測是很熱門的一個研究議題,這個議題主要是希望可以預期未來的股市價格趨勢,是上漲或是下跌。對於短期投資人來說,新聞資訊是一個重要的參考指標,用來預測股價是否會上漲或下跌。近年來,社群網路的興起,使得更即時的文字資訊,也被考慮拿來當做股價預測的另一個因子,然而,社群網路的資訊量越來越大,不是一般傳統的文字處理預測系統可以負擔得起。另外,因為開源軟體的成熟,使得處理大量資料的運算平台更容易的被架設。基於這些想法,這篇研究的主要議題是,建構一個處理中文文字資訊的可擴展性股市趨勢預測系統,利用大量的中文新聞文章當作第一步,進行股價的趨勢預測。有了這個系統,將會加速預測模型的建立與效能驗證,此外,搭配近幾年興起的雲端運算服務,使得這個平台更容易且即時地被搭建在雲端上,詳細地說,就是把系統封裝成映像檔,當需要的時候再跟雲端服務商租借資源根據映像檔來啟動平台服務。搭配雲端的另外一個問題,是討論如何充分利用既有的資源,在需求超出系統的負載量時,到雲端租借額外的資源來支持服務的品質。這篇研究的成果顯示,我們使用開源軟體所搭建的系統,在中文文字處理方面,Jieba中文斷詞開源專案,在本篇研究的修改過後,在四核處理2.3GB新聞文字的平行化的能力,提高了80%。然而,負責機器學習部分的Mahout Project並沒有顯示出效能的提升。

並列摘要


Stock Market Prediction is a problem that people deal with when they want to predict market trend. For short-term investment, news is one of the most important factors that has influence on stock price. Based on this idea, our target issue is to build a scalable stock market prediction system, which can process Chinese news articles in order to produce a prediction model. With this system, we can speed up the model training process and take into account more training source, e.g., posts from China’s microblog service, Sina Weibo. Also, with the emergence of cloud computing, a scalable system can lease more resources from cloud to serve the growing work. Our solution about building this system is using mature open source project, such as Hadoop for parallel computing, Mahout for scalable machine learning, and Jieba for Chinese text segmentation. We provide a basic algorithm for stock trend prediction, build the software stack, collect the news in Taiwan during March 2009 to May 2014 and also run some experiments to evaluate scalability of this system. The result shows that in this application, Jieba Chinese text Segmentation tool can scale well with multiprocessing, namely, 80 percent faster with four parallel processes compared to sequential mode. However, Mahout does not show significant speedup in this scenario.

參考文獻


[1] Gabriel Pui Cheong Fung, Jeffrey Xu Yu, and Wai Lam. Stock prediction: Integrat- ing text mining approach using real-time news. In Computational Intelligence for Financial Engineering, 2003. Proceedings. 2003 IEEE International Conference on, pages 395–402. IEEE, 2003.
[2] Robert P Schumaker and Hsinchun Chen. Textual analysis of stock market predic- tion using breaking financial news: The azfin text system. ACM Transactions on Information Systems (TOIS), 27(2):12, 2009.
[4] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H Byers. Big data: The next frontier for innova- tion, competition, and productivity. 2011.
[6] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.
[11] Ruby on rails web framework. (2014). http://rubyonrails.org/, Retrieved July 2, 2014.

延伸閱讀