Title

強化學習用於發明自旋冰模型上的蒙地卡羅演算法

Translated Titles

Discover Monte Carlo Algorithm on Spin Ice Model Using Reinforcement Learning

DOI

10.6342/NTU201802242

Authors

趙愷文

Key Words

深度學習 ; 強化學習 ; 蒙地卡羅演算法 ; 自旋冰模型 ; Reinforcement Learning ; Deep Learning ; Monte Carlo Algorithm ; Spin Ice Model

PublicationName

臺灣大學物理學研究所學位論文

Volume or Term/Year and Month of Publication

2018年

Academic Degree Category

碩士

Advisor

高英哲

Content Language

英文

Chinese Abstract

強化學習具備了在動態環境中卓越的探索與決策能力,成為機器學 習中快速發展的研究領域。強化學習受到了心理學習的啟發,其理論 架構中包含一個具有可改進策略能力的機器代理人。代理人使用現有 策略,對環境採取行動並根據收到的回饋,進一步改善策略,從不斷 嘗試與修正的過程中達成目標。 在這篇論文中,我們將利用強化學習,讓代理人自我創造出在自旋 冰模型上的蒙地卡羅演算法。自旋冰是一種磁性挫折系統,在低能量 時具有強烈的拓撲拘束條件。在物理學中,迴路蒙地卡羅演算法可以 很有效率的更新系統而不會破壞其局部的拓樸約束。 但是,有效率的更新演算法往往是問題相依的,當面對新的系統時 並需要設計新的演算法。因此,我們開發了一種基於強化學習的架構, 利用深度神經網路對蒙地卡羅狀態轉換子建模。並將馬可夫鏈推廣為 馬可夫決策過程,使得機器代理人能在與物理系統交互作用中創造出 有效率的更新策略。並且我們相信,本演算法可以作為收尋蒙地卡羅 更新方法的通用架構。

English Abstract

Reinforcement learning is a fast-growing research field due to its outstanding exploration capability in dynamic environments. Inspired by psychological learning theories, the reinforcement learning framework contains a software agent with improvable policies that takes actions on the environment and attempts to achieve the goal according to given reward. A policy is a stochastic rule which governs the decision-making process of the agent and is updated based on the response of the environment. In this work, we apply reinforcement learning framework on the spin ice model. Spin ice is a frustrated magnetic system with strong topological constraint on the low-energy configurations. In the physics community, it is well-known that the loop Monte Carlo algorithm can update the system efficiently without breaking its local constraint. However, from a broader perspective, the global update schemes can be problem-dependent and require customized algorithm design. Therefore, we exploit a reinforcement learning method that parameterize transition operator with neural networks. By extending the Markov chain to Markov decision process, the algorithm can adaptively search for global update policy through its interactions with the physical model. It may serve as a general framework for the search of update patterns.

Topic Category 基礎與應用科學 > 物理
理學院 > 物理學研究所
Reference
  1. [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
  2. [2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driess- che, J. Schrittwieser, I. Antonoglou, V. Panneer shelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
  3. [3] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
  4. [4] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.
  5. [5] G. Carleo and M. Troyer, “Solving the quantum many-body problem with ar- ti cial neural networks,” Science, vol. 355, no. 6325, pp. 602–606, 2017.
  6. [6] J. Carrasquilla and R. G. Melko, “Machine learning phases of matter,” Nature Physics, vol. 13, no. 5, p. 431, 2017.
  7. [7] L. Wang, “Discovering phase transitions with unsupervised learning,” Physical Review B, vol. 94, no. 19, p. 195105, 2016.
  8. [8] O. A. von Lilienfeld, R. Ramakrishnan, M. Rupp, and A. Knoll, “Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties,” International Journal of Quantum Chemistry, vol. 115, no. 16, pp. 1084–1093, 2015.
  9. [9] Y. Levine, D. Yakira, N. Cohen, and A. Shashua, “Deep learning and quantum entanglement: Fundamental connections with implications to network design,” CoRR, abs/1704.01552, 2017.
  10. [10] E. Stoudenmire and D. J. Schwab, “Supervised learning with tensor networks,” in Advances in Neural Information Processing Systems, pp. 4799–4807, 2016.
  11. [11] D. P. Landau and K. Binder, A guide to Monte Carlo simulations in statistical physics. Cambridge university press, 2014.
  12. [12] M. Newman and G. Barkema, Monte Carlo Methods in Statistical Physics chapter 1-4. Oxford University Press: New York, USA, 1999.
  13. [13] R. H. Swendsen and J.-S. Wang, “Nonuniversal critical dynamics in monte carlo simulations,” Physical review letters, vol. 58, no. 2, p. 86, 1987.
  14. [14] U. Wol , “Collective monte carlo updating for spin systems,” Physical Review Letters, vol. 62, no. 4, p. 361, 1989.
  15. [15] W. Abdul-Razzaq and J. Kouvel, “Spin glassiness and ferromagnetism in dis- ordered ni-mn,” Journal of applied physics, vol. 55, no. 6, pp. 1623–1627, 1984.
  16. [16] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, “Self-learning monte carlo method,” Physical Review B, vol. 95, no. 4, p. 041101, 2017.
  17. [17] Y. Nagai, H. Shen, Y. Qi, J. Liu, and L. Fu, “Self-learning monte carlo method: Continuous-time algorithm,” Physical Review B, vol. 96, no. 16, p. 161102, 2017.
  18. [18] J. Liu, H. Shen, Y. Qi, Z. Y. Meng, and L. Fu, “Self-learning monte carlo method and cumulative update in fermion systems,” Physical Review B, vol. 95, no. 24, p. 241104, 2017.
  19. [19] X. Y. Xu, Y. Qi, J. Liu, and L. Fu, “Self-learning quantum monte carlo method in interacting fermion systems,” Phys. Rev. B, vol. 96, p. 041119, 2017.
  20. [20] R. M. Neal, “Probabilistic inference using markov chain monte carlo methods,” 1993.
  21. [21] D. J. MacKay and D. J. MacKay, Information theory, inference and learning algorithms. Cambridge university press, 2003.
  22. [22] W. K. Hastings, “Monte carlo sampling methods using markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1970.
  23. [23] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of state calculations by fast computing machines,” The journal of chemical physics, vol. 21, no. 6, pp. 1087–1092, 1953.
  24. [24] M. L. Puterman, Markov decision processes: discrete stochastic dynamic pro- gramming. John Wiley & Sons, 2014.
  25. [25] A. M. Turing, “Computing machinery and intelligence,” in Parsing the Turing Test, pp. 23–65, Springer, 2009.
  26. [26] L. Deng and D. Yu, “Deep learning: Methods and applications,” tech. rep., May 2014.
  27. [27] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning, vol. 1. MIT press Cambridge, 2016.
  28. [28] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115–133, 1943.
  29. [29] S. Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, no. 02, pp. 107–116, 1998.
  30. [30] X. Glorot, A. Bordes, and Y. Bengio, “Domain adaptation for large-scale sentiment classi cation: A deep learning approach,” in Proceedings of the 28th international conference on machine learning (ICML-11), pp. 513–520, 2011.
  31. [31] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into recti ers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
  32. [32] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of recti ed activa- tions in convolutional network,” arXiv preprint arXiv:1505.00853, 2015.
  33. [33] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
  34. [34] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, p. 533, 1986.
  35. [35] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, vol. 1. MIT press Cambridge, 1998.
  36. [36] G. Tesauro, “Temporal di erence learning and td-gammon,” Communications of the ACM, vol. 38, no. 3, pp. 58–68, 1995.
  37. [37] G. Tesauro, “Td-gammon: A self-teaching backgammon program,” in Applica- tions of Neural Networks, pp. 267–285, Springer, 1995.
  38. [38] B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis, “A neuro-dynamic programming approach to retailer inventory management,” in Decision and Control, 1997., Proceedings of the 36th IEEE Conference on, vol. 4, pp. 4052– 4057, IEEE, 1997.
  39. [39] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
  40. [40] H. Daumé, J. Langford, and D. Marcu, “Search-based structured prediction,” Machine learning, vol. 75, no. 3, pp. 297–325, 2009.
  41. [41] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in neural information processing systems, pp. 1057–1063, 2000.
  42. [42] A. Shapiro, “Monte carlo sampling methods,” Handbooks in operations research and management science, vol. 10, pp. 353–425, 2003.
  43. [43] R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” in Reinforcement Learning, pp. 5–32, Springer, 1992.
  44. [44] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in neural information processing systems, pp. 1008–1014, 2000.
  45. [45] S. T. Bramwell and M. J. Gingras, “Spin ice state in frustrated magnetic pyrochlore materials,” Science, vol. 294, no. 5546, pp. 1495–1501, 2001.
  46. [46] G. Ehlers, A. Cornelius, T. Fennell, M. Koza, S. Bramwell, and J. Gardner, “Evidence for two distinct spin relaxation mechanisms in ‘hot’ spin ice ho2ti2o7,” Journal of Physics: Condensed Matter, vol. 16, no. 11, p. S635, 2004.
  47. [47] G. Barkema and M. Newman, “Monte carlo simulation of ice models,” Physical Review E, vol. 57, no. 1, p. 1155, 1998.
  48. [48] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
  49. [49] N. Heess, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, A. Eslami, M. Riedmiller, et al., “Emergence of locomotion behaviours in rich environments,” arXiv preprint arXiv:1707.02286, 2017.
  50. [50] A. Y. Ng, D. Harada, and S. Russell, “Policy invariance under reward trans- formations: Theory and application to reward shaping,” in ICML, vol. 99, pp. 278–287, 1999.
  51. [51] J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015.
  52. [52] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International Conference on Machine Learning, pp. 1928–1937, 2016.
  53. [53] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., “Tensor ow: A system for large-scale machine learning.,” in OSDI, vol. 16, pp. 265–283, 2016.
  54. [54] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic di erentiation in pytorch,” 2017.
  55. [55] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Küttler, J. Agapiou, J. Schrittwieser, et al., “Starcraft ii: a new challenge for reinforcement learning,” arXiv preprint arXiv:1708.04782, 2017.