透過您的圖書館登入
IP:3.17.68.14
  • 學位論文

以風險設計強化學習之獎勵並應用於最佳化交易執行

Risk-based Reward Shaping Reinforcement Learning for Optimal Trading Execution

指導教授 : 許永真

摘要


最佳化交易執行在整個金融交易的流程中是一個針對如何執行交易訊號的重要課題,已有其他研究證實它能夠劇烈的影響交易策略的獲利能力。而在近幾年當中,由於電子交易所的興起,有很多研究採用強化學習來做最佳化交易執行,並證實其表現比傳統金融方式還要好。然而,這些方法並沒有完整的考量風險與獲利間的平衡,使得訓練完的機器只追求獲利。這樣的狀況會導致我們對機器的表現有錯誤的判斷標準,並喪失交易執行策略的多樣性。因此,在這篇論文當中,我們提出了兩種以風險為基礎的獎勵設計來解決以上兩個問題。第一種做法是將原本的獎勵對市場波動度做正規化,其結果也證明了這種做法能透過給予機器較真實的回饋來提昇整體策略的獲利能力以及穩定度,而這種做法同時可以應用在其他使用強化學習的金融交易上。我們的另一種獎勵設計是針對風險,使用交易單的執行比率來取代標準差,這種做法會使得獎勵較為緊密,對機器來說較好訓練,另外,與之搭配的是一個由多目標馬可夫決策過程組成的框架,可以讓策略同時考量獲利與風險。在這樣的設計下,結果顯示我們的做法能夠對風險跟獲利間的平衡做出更好的詮釋。整體上來說,有了這兩種方法,我們可以先訓練出一個更好策略,再針對這個策略做出分化,使得交易員能夠針對不同的投資者以及商品做出更彈性的交易執行。

並列摘要


Optimal trading execution is an important issue of handling trading signals in a pipeline of financial trading, which had been proven that it can extremely influence the profitability of the trading strategy. In recent years, due to the popularity of electronic exchanges, previous studies applied data-driven methods such as reinforcement learning (RL) on it and had a better performance than traditional financial methods. However, it seems that they do not comprehensively consider the trade-off between risk and return, which would make the RL agent extremely pursue profit. This situation would result in a wrong measurement of agent performance and a lack of diversity of execution strategies. In this thesis, we provided two risk-based reward shaping methods to solve the above problems. The first one shapes the reward by the regularization of market volatility, which has shown that it can help the agent be more profitable and robust by providing more actual feedback of actions. Another one shapes the reward for risk from the standard deviation to the executed inventory ratio, which is a dense reward for better learning. And, it is combined with a multi-objective Markov decision process (MOMDP) framework, considering both profit and risks. Under this design, our results showed that we could exhibit a better interpretation of the trade-off between risk and return than previous works. Overall, with these two methods, we can firstly have a better performance of the RL agent and secondly diversify the execution strategies by the risk-reward and MOMDP framework, which can provide a flexible application for traders to handle the trading signals for different investors and financial assets.

參考文獻


[1] Robert Almgren and Neil Chriss. “Optimal execution of portfolio transactions”. In: Journal of Risk 3 (2001), pp. 5–40.
[2] Bowen Baker et al. “Emergent Tool Use From Multi-Agent Autocurricula”. In: International Conference on Learning Representations. 2020. URL: https://openreview.net/forum?id=SkxpxJBKwS.
[3] Wen hang Bao and Xiao yang Liu. “Multi-Agent Deep Reinforcement Learning for Liquidation Strategy Analysis”. In: Proceedings of the 36th International Conference on Machine Learning. AI in Finance: Applications and Infrastructure for Multi-Agent Learning. 2019.
[4] Dimitris Bertsimas and Andrew W Lo. “Optimal control of execution costs”. In: Journal of Financial Markets 1.1 (1998), pp. 1–50.
[5] Fischer Black and Myron Scholes. “The Pricing of Options and Corporat Liabilities”. In: Journal of Political Economy 81.3 (1973), pp. 637–654. ISSN: 00223808,1537534X. URL: http://www.jstor.org/stable/1831029.

延伸閱讀