可避免風險之強化學習演算法

傳統的強化學習演算法的目的為最大化累計折現報酬，並忽略了報酬的分布，使得報酬可能極不穩定而且會有極大損失的結果出現。本篇論文提出以期望損失作為風險定義並因此設計一可避免風險之強化學習演算法以改善傳統的強化學習。實驗結果顯示避免期望損失的強化學習演算法同時也可降低其他類的風險，如報酬的標準差，最大損失，損失機率，及實際報酬低於期望報酬的情形。同時避免期望損失可以降低一般公司及金融企業的違約風險及股市的斷頭危機，而這是既有文獻中尚未能有效處理的風險概念。我們設計了一個可分解的強化學習演算法以有效降低期望損失。此架構包含了兩個子代理人及一個仲裁者。子代理人各自學習期望損失及期望報酬，而仲裁者評估可能行為所得到的風險及報酬後採取最適決定。實驗分為格子世界及台灣的電子股指數模擬交易兩部分。在格子世界裡我們會展現不同程度的危險厭惡代理人所獲得的期望報酬及風險，與另一敏感損失係數可助於代理人提升期望報酬在給定的風險下。在電子股模擬交易則與懲罰變異數與風險敏感的強化學習演算法進行比較，結果發現在給定的投資報酬率下，避免期望損失的代理人可有效降低其他風險評估值。

關鍵字

強化學習；風險；人工智慧

並列摘要

Traditional reinforcement learning agents focus on maximizing the expected cumulated rewards and ignore the distribution of the return. However, for some tasks people prefer actions that might not lead to as much return but more likely to avoid disaster. This thesis proposes to define risk as the expected loss and accordingly design a risk-avoiding reinforcement learning agent. Our experiment shows that such risk-avoiding reinforcement learning agent can improve different types of risks such as variance of return, the maximal loss, the probability of fatal errors. The risk defined based on loss is capable of reducing the credit risk to the banks as well as the loss existing in stock marginal trading, which can hardly be coped effectively in the previous literatures. We design a Q-decomposed reinforcement learning system to handle the tradeoff between expected loss and return. The framework consists of two subagents and one arbiter. Subagents learn the expected loss and the expected return individually, and the arbiter evaluates the sum of the return and loss of each action and takes the best one. We perform two experiments: the grid world and Taiwanese Electronic Stock Index simulated trades. In the grid world, we evaluate the expected return and the expected loss of different level of risk-averse agents. We compare the risk-avoiding agent with the variance-penalized and risk sensitive agent in the stock trading experiment. The results show that our risk-avoiding agent can not only reduce the expected loss but also cut down other kinds of risks.

並列關鍵字

Reinforcement Learning ； Risk ； Artificial Intelligence

參考文獻

[3] C. Acerbi and D. Tasche: Expected Shortfall, a natural coherent alternative to value at risk, 2001

[5] E. Maskin and J. Riley, Optimal Auctions with Risk Averse Buyers, Econometrica, 52(1), The Econometric Society,1984

[6] H. Kashima, Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk, Transactions on Information and Systems, pages 2043-2052, Oxford University Press , 2007

[8] J. Li and L. Chan, Reward Adjustment Reinforcement Learning for Risk-averse Asset Allocation, in Proc. International Joint Conference on Neural Networks pages 534 – 541, 2006

[10] M. Heger. Consideration of Risk in Reinforcement Learning. in Proc. 11th International Conference on Machine Learning, pages 105-111,199

國際替代計量

可避免風險之強化學習演算法

全文下載

主題瀏覽