透過您的圖書館登入
IP:18.222.37.169
  • 學位論文

有邏輯依賴關係的強化學習問題處理

Solving Reinforcement Learning Problems in Logical Relation

指導教授 : 孫春在

摘要


目前強化學習已經能很好的解決很多序列的順序決策問題(Sequential decision problem),但是對於有長時間延遲獎勵(Delayed reward),或者獎勵稀疏(Sparse reward)的問題,仍然沒有很有效的解決方式。 本研究關注整個遊戲過程(episode)中每個步驟前所處的狀態,這個步驟執行的動作以及動作前後狀態的變化,在原有強化學習算法中加入logical reward,通過增加關鍵步驟的logical reward的方式,對解決有延遲獎勵的問題作出一種新的嘗試。 本方法主要分析單一遊戲過程(episode)中狀態的變化,得到一個新的狀態-動作對應的表,並將表中的logical reward值加入到遊戲環境給出的Reward中,進行強化學習算法的迭代。從而達到讓Agent掌握邏輯關係的效果。 另一方面,通過狀態的提取,可以獲得造成邏輯關係的關鍵信息,並且這部分信息在類似環境中是相同的,因此可以將一個環境中訓練出來的結果應用在另一個類似的環境中,實現遷移學習的效果。

並列摘要


Reinforcement learning has been a good way to solve many sequential decision problems, but as for problem, which with delayed reward or sparse reward, there is still no effective solution. This study focuses on three things: 1.the state of each step in the entire episode, 2.the action in this step, and 3.the changes in the state before and after the action. Adding logical reward to the reinforcement learning algorithm. Try to solve problems with delayed rewards or sparse rewards by adding logical reward to key steps. This method mainly analyzes the changes between states in a single episode, generates a new state-action table, then finds the logical reward value in the table and adds it to reward given by the game environment. So that agent can handle logic in that environment. On the other hand, through the extraction of states, it is possible to obtain the key information that causes the logical relation, and this information is same in similar environments, so the results trained in one environment can be applied to another similar environment.

參考文獻


1. Mnih, V., et al., Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
2. Mnih, V., et al., Human-level control through deep reinforcement learning. Nature, 2015. 518(7540): p. 529.
3. Silver, D., et al., Mastering the game of Go with deep neural networks and tree search. nature, 2016. 529(7587): p. 484-489.
4. Silver, D., et al., Mastering the game of go without human knowledge. Nature, 2017. 550(7676): p. 354.
5. Taylor, M.E. and P. Stone, Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 2009. 10(Jul): p. 1633-1685.

延伸閱讀