Translated Titles

Using Contextual Multi-Armed Bandit Algorithms for Recommending Investment in Stock Market



Key Words

情境式多拉桿拉霸問題 ; 個人化推薦系統 ; 股票推薦系統 ; 情境式拉霸問題 ; 線性上信賴界 ; LinUCB ; Contextual Bandit Problem ; Stock Recommendation ; Contextual Multi-Armed Bandit ; Personalized Recommendation System



Volume or Term/Year and Month of Publication


Academic Degree Category




Content Language


Chinese Abstract

情境式拉霸問題 (Contextual Bandit Problem) 經常被使用來模擬線上推薦的應用,像是文章、音樂、影片等推薦系統。線性上信賴界(LinUCB)是目前解決情境式拉霸問題的演算法之一,它主要使用線性回歸並且從環境當中所得到的回饋(feedback)進行不斷的學習並更新其內部的模型。然而我們觀察到在股票投資市場當中,使用情境式拉霸問題來解決股票推薦問題的應用少之又少,大部分研究推薦目的為投資營利,並非根據投資者本身的投資風險屬性、投資標的的特性推薦他們符合投資屬性的股票。 我們提出一個情境式拉霸問題模型來模擬推薦股票給使用者的個人化推薦系統。情境式多拉桿拉霸問題模型從投資者過往的投資紀錄找出他的投資屬性,再根據這些屬性來推薦股票的組合。而股票組合是從公司財務分析的基本面及股票變化的技術面二者分類出來的結果,決定推薦組合後,再根據推薦組合和所有股票的相似性去做排名,然後推薦股票。 我們實證資料來源是網路上的模擬投資股市的資料集,實驗的結果顯示我們提出的方法在推薦股票的領域比現有的方法好。

English Abstract

The Contextual Bandit Problem (CMAB) is usually used to recommend for online applications on article, music, movie, etc. One leading algorithm for contextual bandit is the LinUCB algorithm, which updates internal linear regression models by the partial feedback from the environment. However, we observe that CMAB is rarely used in the stock recommendation, while most of the recommendations are for the purpose of profit, and ignore investor’s features (risk tolerance, investment features, and the others). We propose a personalized recommendation system for stock by using contextual multi-armed bandit algorithm. We take investor’s investment records as user features, and recommend the “arm”, which is a type of stock, based on two kinds of analysis, the technical and fundamental analysis. To the chosen arm, we rank the stocks according to the similarity of the stock and the arm. Our experiment is base on an online investment dataset, and the result demonstrates that our method outperforms other algorithms. Our experiment dataset collects simulation investment on the online website, and the result demonstrates that our method outperforms other algorithms.

Topic Category 管理學院 > 資訊管理學系研究所
社會科學 > 管理學
  1. [5]Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, 7(1), 76-80.
  2. [15] Wang, X., Wang, Y., Hsu, D., & Wang, Y. (2014). Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 11(1), 7.
  3. [17] Nagy, R. A., & Obenberger, R. W. (1994). Factors influencing individual investor behavior. Financial Analysts Journal, 50(4), 63-68.
  4. [18] Markowitz, H. (1952). Portfolio selection. The journal of finance, 7(1), 77-91.
  5. [19] Sharpe, W. F. (1994). The sharpe ratio. The journal of portfolio management, 21(1), 49-58.
  6. [20] Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica: Journal of the econometric society, 263-291.
  7. [22] Hsu, Y. L., & Chow, E. H. (2010, August). House money and investment risk taking. In 23rd Australasian Finance and Banking Conference.
  8. [24] Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., & Keogh, E. (2013). Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26(2), 275-309.
  9. [25] Chung, F. L., Fu, T. C., Ng, V., & Luk, R. W. (2004). An evolutionary approach to pattern-based time series segmentation. Evolutionary Computation, IEEE Transactions on, 8(5), 471-489.
  10. [26] Müller, M. (2007). Information retrieval for music and motion (Vol. 2). Berlin: Springer.
  11. Reference
  12. [1] Pazzani, M. J., & Billsus, D. (2007). Content-based recommendation systems. In The adaptive web (pp. 325-341). Springer Berlin Heidelberg.
  13. [2] Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001, April). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web (pp. 285-295). ACM.
  14. [3] Schafer, J. B., Konstan, J. A., & Riedl, J. (2001). E-commerce recommendation applications. In Applications of Data Mining to Electronic Commerce (pp. 115-153). Springer US.
  15. [4] Lynch, C. (2001, June). Personalization and recommender systems in the larger context: New directions and research questions. In Second DELOS Network of Excellence Workshop on Personalisation and Recommender Systems in Digital Libraries.
  16. [6] Miller, B. N., Albert, I., Lam, S. K., Konstan, J. A., & Riedl, J. (2003, January). MovieLens unplugged: experiences with an occasionally connected recommender system. In Proceedings of the 8th international conference on Intelligent user interfaces (pp. 263-266). ACM.
  17. [7] Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010, April). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (pp. 661-670). ACM.
  18. [8] Hariri, N., Mobasher, B., & Burke, R. (2014, October). Context adaptation in interactive recommender systems. In Proceedings of the 8th ACM Conference on Recommender systems (pp. 41-48). ACM.
  19. [9] Kuleshov, V., & Precup, D. (2014). Algorithms for multi-armed bandit problems.arXiv preprint arXiv:1402.6028.
  20. [10] Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).
  21. [11] Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3), 235-256.
  22. [12] Walsh, T. J., Szita, I., Diuk, C., & Littman, M. L. (2009, June). Exploring compact reinforcement-learning representations with linear regression. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (pp. 591-598). AUAI Press.
  23. [13] Chu, W., Li, L., Reyzin, L., & Schapire, R. E. (2011). Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics (pp. 208-214).
  24. [14] Bouneffouf, D., Bouzeghoub, A., & Gançarski, A. L. (2012). Hybrid-ε-greedy for mobile context-aware recommender system. In Advances in Knowledge Discovery and Data Mining (pp. 468-479). Springer Berlin Heidelberg.
  25. [16] Shen, W., Wang, J., Jiang, Y. G., & Zha, H. (2015, June). Portfolio choices with orthogonal bandit learning. In Proceedings of the 24th International Conference on Artificial Intelligence (pp. 974-980). AAAI Press.
  26. [21] Barberis, N., Huang, M., & Santos, T. (1999). Prospect theory and asset prices (No. w7220). National bureau of economic research.
  27. [23] Berndt, D. J., & Clifford, J. (1994, July). Using Dynamic Time Warping to Find Patterns in Time Series. In KDD workshop (Vol. 10, No. 16, pp. 359-370).