透過您的圖書館登入
IP:3.144.248.24
  • 學位論文

應用強化獎勵機制學習解魔術方塊

Solving Rubik's Cube by Policy Gradient Based Reinforcement Learning

指導教授 : 林永隆

摘要


強化學習系統提供了代理人與環境互動機制,策略梯度方法目的在於儘可能採 取好的動作。我們提出一個在強化學習系統上運用線性的策略梯度方法和強化獎 懲機制進而達到對於好的動作有較高的機率。實驗結果顯示此方法用神經網路模式 可以解部分的魔術方塊問題,但是仍不能解所有問題。

並列摘要


Reinforcement Learning provides a mechanism for training an agent to interact with its environment. Policy gradient makes the right actions more probable. We propose using a linear policy gradient method in a deep neural network-based reinforcement learning. The proposed method employs an intensifying reward function to increase the probabilities of right actions to solve the Rubik's Cube problems. Experiments show that our proposed neural network learned to solve some Rubik's Cube states. For more difficult initial states, the network still cannot always give the correct suggestion.

參考文獻


[1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529,
no. 7587, pp. 484{489, 2016.
[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.
[3] "MuJoCo physics engine." [Online]. Available: http://www.mujoco.org/.
[4] H. Kociemba, "Two-Phase Algorithm Details." [Online]. Available: http://

延伸閱讀