有邏輯依賴關係的強化學習問題處理

目前強化學習已經能很好的解決很多序列的順序決策問題（Sequential decision problem），但是對於有長時間延遲獎勵（Delayed reward），或者獎勵稀疏（Sparse reward）的問題，仍然沒有很有效的解決方式。本研究關注整個遊戲過程（episode）中每個步驟前所處的狀態，這個步驟執行的動作以及動作前後狀態的變化，在原有強化學習算法中加入logical reward，通過增加關鍵步驟的logical reward的方式，對解決有延遲獎勵的問題作出一種新的嘗試。本方法主要分析單一遊戲過程（episode）中狀態的變化，得到一個新的狀態-動作對應的表，並將表中的logical reward值加入到遊戲環境給出的Reward中，進行強化學習算法的迭代。從而達到讓Agent掌握邏輯關係的效果。另一方面，通過狀態的提取，可以獲得造成邏輯關係的關鍵信息，並且這部分信息在類似環境中是相同的，因此可以將一個環境中訓練出來的結果應用在另一個類似的環境中，實現遷移學習的效果。

關鍵字

強化學習；獎勵稀疏；延遲獎勵；邏輯關係

並列摘要

Reinforcement learning has been a good way to solve many sequential decision problems, but as for problem, which with delayed reward or sparse reward, there is still no effective solution. This study focuses on three things: 1.the state of each step in the entire episode, 2.the action in this step, and 3.the changes in the state before and after the action. Adding logical reward to the reinforcement learning algorithm. Try to solve problems with delayed rewards or sparse rewards by adding logical reward to key steps. This method mainly analyzes the changes between states in a single episode, generates a new state-action table, then finds the logical reward value in the table and adds it to reward given by the game environment. So that agent can handle logic in that environment. On the other hand, through the extraction of states, it is possible to obtain the key information that causes the logical relation, and this information is same in similar environments, so the results trained in one environment can be applied to another similar environment.

並列關鍵字

Reinforcement learning ； Sparse reward ； Delayed reward ； Logical relation

參考文獻

1. Mnih, V., et al., Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.

Google Scholar

2. Mnih, V., et al., Human-level control through deep reinforcement learning. Nature, 2015. 518(7540): p. 529.

Google Scholar

3. Silver, D., et al., Mastering the game of Go with deep neural networks and tree search. nature, 2016. 529(7587): p. 484-489.

Google Scholar

4. Silver, D., et al., Mastering the game of go without human knowledge. Nature, 2017. 550(7676): p. 354.

Google Scholar

5. Taylor, M.E. and P. Stone, Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 2009. 10(Jul): p. 1633-1685.

Google Scholar

國際替代計量

有邏輯依賴關係的強化學習問題處理

全文下載

主題瀏覽