股市新聞的情感分類可辨別出股市新聞文章的正向或負向情感,因為這項技術可輔助投資者在股票市場中做出決策,所以對於股市趨勢預測已成為一種新興的技術。在本研究中,我們運用文章中的情緒字(emotion words)以及這些情緒字所具備不同的強度(Intensity),當作股市新聞文章情感分類的特徵。為了有效獲得這些情緒字及其權重,本研究發展出一套文脈熵模型(Contextual Entropy Model, CE),以已事先標記情感的一套股市新聞文章集合所產生的基礎字集合當作擴增情緒字的基礎,並利用文脈熵模型計算出的強度,自動擴增出與基礎字相似度較高的情緒字。文脈熵模型可利用熵測量比較兩個文字之間在文章中的分布程度,所以這個模型可檢測出與基礎字相類似的文字並將其擴增出來。 實驗結果顯示,利用擴增方法計算出字的權重以找出更多有效的情緒字,可改善分類的效能。此外,除了使用情緒字外,將強度納為股市新聞文章分類的特徵之一,更可有效提升股市新聞文章情感分類的準確率。與其他擴增方法比較,本研究提出的文脈熵模型(Contextual Entropy Model, CE)之測試結果已超越先前研究提出的點式交互資訊(Pointwise Mutual Information, PMI)擴增法。
Sentiment classification of stock news is a task of identifying positive and negative stock new articles, which has been an emerging technique for stock trend prediction because it can facilitate investor’s decision making in the stock market. In this paper, we propose the use of both emotion words and their intensity as features to classify the sentiment of stock news articles. To acquire emotion words and their intensity, this study develops a contextual entropy model to expand a set of seed words generated from a small corpus of stock new articles annotated with sentiment. The contextual entropy model can calculate the similarity between two words by comparing their context distributions using an entropy measure so that it can discover the words similar to the seed words for expansion. Experimental results show that the expansion method can discover more useful emotion words with intensity, thus improving the classification performance. In addition, incorporating the intensity further improved the performance. In comparison with other expansion method, the proposed contextual entropy model outperformed the pointwise mutual information (PMI)-based expansion method proposed in the previous study.