  • 學位論文


A Stock Recommender System based on Text Mining and Machine Learning

指導教授 : 陳灯能


隨著資訊科技的發展,網路所承載的資料量也越來越大,這些大量且仍隨著時間在增長的資料,它們隱含有知識但卻受到無用雜訊的包圍,其中又屬自然語言文字的資料最多,但因其無結構的特性,而難以進行一般的探勘工作。 故本研究希望探討文字探勘技術於自然語言資料處理的應用,考量到即時性、資料量、驗證可靠性與簡易性等條件,最後選擇以財經新聞文本進行股票趨勢預測為題。 具體實驗部分,本研究透過自行撰寫之擷取模組,自鉅亨網擷取財經新聞作為文本,並以中研院提供的斷詞系統輔助中文自然語言處理,後以實際相關類股當日漲跌為預測正確性驗證,在演算方面使用WEKA執行所需的機器學習演算,實作並探討此一命題。 基於實驗過程所發現屬性選配評估上有加入文本自身效度的可能性,故本研究採用個別文本對當日漲跌幅度影響程度為其貢獻度,再依據貢獻度分配個別文本之屬性加權,並經不同組合之抽樣實驗探究其可行性。


The data loading of internet keeps growing with the evolution of information technology, these data have lots of implied knowledge, but unstructured type and noise could be a problem, so we study to refine "Text Mining" for analysis process. Considering about quality and quantity of data, valid convenience, and time cost, we subject financial news to predict the trend of stock market for research, through HTML parser and CKIP for bag-of-words processing and WEKA for machine learning. Base on the importance of document itself, we propose a context oriented feature assign method by fluctuation of each stock, and investigate the feasibility by sampling test.


鍾任明, 李維平, and 吳澤民. 2005. 運用文字探勘於日內股價漲跌趨勢預測之研究, 撰者.
Acid, S., and de Campos, L. M. 2003. "Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs," J. Artif. Intell. Res.(JAIR) (18), pp 445-490.
Anderson, J. R. 1986. Machine learning: An artificial intelligence approach, (1 ed.) Morgan Kaufmann.
BABU, M. S., GEETHANJALI, D., and KUMARI, V. R. 2010. "Textual Analysis of Stock Market Prediction Using Financial News Articles," The Technology World Quarterly Journal (2).
Black, P. E. 2004. Dictionary of algorithms and data structures, (1 ed.) National Institute of Standards and Technology.


