應用強化獎勵機制學習解魔術方塊

Yi-Ching. Chen

透過您的圖書館登入 IP:3.144.248.24

透過您的圖書館登入

IP:3.144.248.24

繁體中文
English
简体中文

精確檢索 : 冠狀病毒
模糊檢索 : 冠狀病毒
冠狀病毒感染

冠狀病毒疾病
查詢出版品: 冠狀病毒

進階查詢

查詢歷史

主題瀏覽

【下載完整報告】AI熱潮從學術研究也能看出端倪？哪些議題是2023熱搜議題？

學位論文

應用強化獎勵機制學習解魔術方塊

Solving Rubik's Cube by Policy Gradient Based Reinforcement Learning

陳怡靜(Yi-Ching. Chen)

指導教授：林永隆

國立清華大學/電機資訊學院/資訊工程學系所/碩士(2018年)

全文下載

摘要

強化學習系統提供了代理人與環境互動機制，策略梯度方法目的在於儘可能採取好的動作。我們提出一個在強化學習系統上運用線性的策略梯度方法和強化獎懲機制進而達到對於好的動作有較高的機率。實驗結果顯示此方法用神經網路模式可以解部分的魔術方塊問題，但是仍不能解所有問題。

關鍵字

強化學習；魔術方塊；策略梯度

並列摘要

Reinforcement Learning provides a mechanism for training an agent to interact with its environment. Policy gradient makes the right actions more probable. We propose using a linear policy gradient method in a deep neural network-based reinforcement learning. The proposed method employs an intensifying reward function to increase the probabilities of right actions to solve the Rubik's Cube problems. Experiments show that our proposed neural network learned to solve some Rubik's Cube states. For more difficult initial states, the network still cannot always give the correct suggestion.

並列關鍵字

Reinforcement Learning ； Rubik's Cube ； Policy Gradient

參考文獻

[1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529,

Google Scholar

no. 7587, pp. 484{489, 2016.

Google Scholar

[2] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with deep reinforcement learning," arXiv preprint arXiv:1312.5602, 2013.

Google Scholar

[3] "MuJoCo physics engine." [Online]. Available: http://www.mujoco.org/.

Google Scholar

[4] H. Kociemba, "Two-Phase Algorithm Details." [Online]. Available: http://

Google Scholar

國際替代計量

應用強化獎勵機制學習解魔術方塊

全文下載

主題瀏覽

應用強化獎勵機制學習解魔術方塊

Solving Rubik's Cube by Policy Gradient Based Reinforcement Learning

摘要

關鍵字

並列摘要

並列關鍵字

參考文獻

延伸閱讀

國際替代計量

本網站使用Cookies