本研究為利用深度強化學習來為選擇權避險,在考量交易成本下透過模擬市場資料來學習避險策略。我們採用了近端策略優化 (proximal policy optimization, PPO) 演算法,分別利用 Black-Scholes 模型與 Heston 模型產生的市場資料進行學習。此外,我們以 Leland (1985) 的避險策略做為對照,並比較避險後的損益分布。研究結果顯示,在 Black-Scholes 模型下,PPO 學習到了與 Leland 近似的策略;而在 Heston 模型下,PPO 的損益分布平均值相較於 Leland 更接近零,但標準差則較大。
This study employs deep reinforcement learning for option hedging with transaction costs by learning hedging strategies through simulated market data. We utilize the proximal policy optimization (PPO) algorithm for both the Black-Scholes and Heston models. The Leland (1985) hedging strategy is used as a benchmark for comparing the profit and loss distributions of the hedging strategies. The results indicate that PPO can learn a strategy that approximates Leland’s approach under the Black-Scholes model. Under the Heston model, PPO’s average profit and loss is closer to zero than Leland’s strategy but has a slightly larger standard deviation.