強化學習用於發明自旋冰模型上的蒙地卡羅演算法

強化學習具備了在動態環境中卓越的探索與決策能力,成為機器學習中快速發展的研究領域。強化學習受到了心理學習的啟發,其理論架構中包含一個具有可改進策略能力的機器代理人。代理人使用現有策略,對環境採取行動並根據收到的回饋,進一步改善策略,從不斷嘗試與修正的過程中達成目標。在這篇論文中,我們將利用強化學習,讓代理人自我創造出在自旋冰模型上的蒙地卡羅演算法。自旋冰是一種磁性挫折系統,在低能量時具有強烈的拓撲拘束條件。在物理學中,迴路蒙地卡羅演算法可以很有效率的更新系統而不會破壞其局部的拓樸約束。但是,有效率的更新演算法往往是問題相依的,當面對新的系統時並需要設計新的演算法。因此,我們開發了一種基於強化學習的架構, 利用深度神經網路對蒙地卡羅狀態轉換子建模。並將馬可夫鏈推廣為馬可夫決策過程,使得機器代理人能在與物理系統交互作用中創造出有效率的更新策略。並且我們相信,本演算法可以作為收尋蒙地卡羅更新方法的通用架構。

關鍵字

深度學習；強化學習；蒙地卡羅演算法；自旋冰模型

並列摘要

Reinforcement learning is a fast-growing research field due to its outstanding exploration capability in dynamic environments. Inspired by psychological learning theories, the reinforcement learning framework contains a software agent with improvable policies that takes actions on the environment and attempts to achieve the goal according to given reward. A policy is a stochastic rule which governs the decision-making process of the agent and is updated based on the response of the environment. In this work, we apply reinforcement learning framework on the spin ice model. Spin ice is a frustrated magnetic system with strong topological constraint on the low-energy configurations. In the physics community, it is well-known that the loop Monte Carlo algorithm can update the system efficiently without breaking its local constraint. However, from a broader perspective, the global update schemes can be problem-dependent and require customized algorithm design. Therefore, we exploit a reinforcement learning method that parameterize transition operator with neural networks. By extending the Markov chain to Markov decision process, the algorithm can adaptively search for global update policy through its interactions with the physical model. It may serve as a general framework for the search of update patterns.

並列關鍵字

Reinforcement Learning ； Deep Learning ； Monte Carlo Algorithm ； Spin Ice Model

參考文獻

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.

Google Scholar

[2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driess- che, J. Schrittwieser, I. Antonoglou, V. Panneer shelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.

Google Scholar

[3] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.

Google Scholar

[4] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.

Google Scholar

[5] G. Carleo and M. Troyer, “Solving the quantum many-body problem with ar- ti cial neural networks,” Science, vol. 355, no. 6325, pp. 602–606, 2017.

Google Scholar

國際替代計量

強化學習用於發明自旋冰模型上的蒙地卡羅演算法

全文下載

主題瀏覽