透過您的圖書館登入
IP:18.221.154.151
  • 期刊

透過新聞文章預測股價漲跌趨勢-結合情緒分析、主題模型與模糊支持向量機

Sentiment and Topic Analysis on Financial News for Stock Movement Prediction by Using Fuzzy Support Vector Machine

摘要


能夠成功預測股票漲跌趨勢明顯地有許多好處,根據效率市場假設,公司股票的價值是由當前所有可用的信息給定。當分析師、投資者和機構交易者評估當前股價時,新聞在股價估值過程中發揮重要作用。事實上,金融新聞刊載有關於公司基本面的訊息,和影響市場參與者期望的質化訊息。在大數據時代,線上新聞文章的數量持續增長,在如此巨量的文字資料面前,越來越多的機構依靠現代計算機的高速處理能力來進行文字探勘與機器學習,以建構更準確的股價趨勢預測模型。使用文章中非結構化的數據,是最具挑戰性的研究方向,也將是本研究工作的重點,在本論文中,我們將從新聞文章中萃取出隱含的主題模型與情緒資訊,此外,我們將開發一個模糊支持向量機來融合線上新聞文章內含的豐富資訊,以預測股價的漲跌趨勢。我們認為模糊理論非常適用於本研究,因為文字本身就是模糊的(例如,高低、大小),而且在漲跌趨勢之間,存在一條曖昧的模糊邊界(例如,漲0.01%與漲1%雖然都屬於上漲的類別,但是屬於的程度明顯不同)。本研究在食品類股的預測正確率最高為87%,半導體類股的正確率最高為71%,電腦周邊類股的預測正確率最高為69%,相較於傳統支持向量機透過關鍵字來預測股價漲跌趨勢的正確率僅五成多(接近於隨機猜測),本研究所提出的方法明顯優於傳統的支持向量機預測模型。

並列摘要


Purpose-In Big Data era, the amount of news articles has been increasing tremendously. In front of such a big volume of textual data, more and more institutions rely on the high processing power of modern computers for text mining and machine learning to make more accurate predictions of stock market. Discovering the fundamental data available in unstructured text is the most challenging research aspect and therefore is the goal of this work. Design/methodology/approach-In this study, we extracted the hidden topic model and emotional information from news articles. Besides, we developed a fuzzy support vector machine to merge the abundant information from the on-line news, which can be used to forecast the trend of stock prices. Fuzzy set theory is very useful for this study because the texts are fuzzy in itself (such as high/low and big/small), and there is an ambiguous boundary between rise and fall categories. For example, going up either 10% or 1% belongs to rise category, but is different in degree. Findings-As for this study, the highest forecast accuracy rate was 87% for the food-related stocks, 71% for the semiconductors-related stocks, and 69% for the computer peripheral-related stocks. When compared with traditional support vector machine, which the forecast accuracy rates of stock price trends were just over 50% (nearly to random guess), the method proposed in this study is significantly better than the forecasting model of traditional support vector machine. Research limitations/implications-This study focused only on accurately classifying the stock movement based on hidden topic and sentiment features. In our future work, we plan to investigate more complex semantic features. Practical implications-Successful predictions of stock price movement tendency have obvious advantages. According to the Efficient Market Hypothesis, the price of a stock asset is given by all information available in the moment. Financial news carries information about the firm's fundamentals and qualitative information influencing expectations of market participants. This study employs sentiment and topic analysis on financial news to predict stock movement. This can help analysts, investors and institutional traders to effectively evaluate current stock prices. Originality/value-This study is, to the best of our knowledge, the first attempt to apply fuzzy support vector machine and hidden topic/semantic features for the prediction of stock movement in Taiwan.

參考文獻


Ranco, G., Aleksovski, D., Caldarelli, G., Grčar, M. and Mozetič, I. (2015), ‘The effects of Twitter sentiment on stock price returns,’ PLoS ONE, Vol. 10, No. 9, e0138441, doi:10.1371/journal.pone.0138441
Schumaker, R.P. and Chen, H. (2009), ‘Textual analysis of stock market prediction using breaking financial news: the AZFin text system’, ACM Transactions on Information Systems, Vol. 27, No. 2, [a12]. DOI: 10.1145/1462198.1462204.
Weng, B., Ahmed, M.A. and Megahed, F.M. (2017), ‘Stock market one-day ahead movement prediction using disparate data sources,’ Expert Systems With Applications, doi: 10.1016/j.eswa.2017.02.041 Vol.79, pp. 153-163.
黃金蘭、林以正、謝亦泰、程威銓(2012),『中文版「語文探索與字詞計算」詞典之建立』,中華心理學刊,第 54 卷,第 2 期(2012/06/01),頁 185-201。
An, W. and Liang, M. (2013), ‘Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises’, Neurocomputing, Vol. 110, pp. 101-110.

延伸閱讀