應用強化學習於硬幣移動問題

強化學習是一種機器學習方法，其概念類似於人類學習新事務的過程。強化學習透過軟體代理人 (agent) 在實驗環境中採取動作 (action)，並獲得該動作 (action) 的獎勵反饋。為了獲取最大的獎勵，代理人 (agent) 需要學習如何在特定的情況下採取適當的動作 (action)。 AlphaGo 是一款圍棋人工智慧程式。自從 AlphaGo 擊敗職業圍棋選手後，它成為了人工智慧的里程碑。AlphaGo 的核心技術是深度強化學習 (Deep Reinforcement Learning)，該技術結合了深度學習 (Deep Learning) 和強化學習 (Reinforcement Learning)。深度強化學習這類型的研究，在理論和應用方面都有卓越的成果。在本篇論文中，我們設計了一種深度強化學習的方法來解決硬幣移動問題。硬幣移動問題的定義為：給定一列硬幣，初始狀態包含 n 個五分錢硬幣和 n 個一分錢硬幣，且所有五分錢硬幣排列在所有一分錢硬幣的左側，其中 n 必須大於等於 3。玩家通過移動 k 個相鄰硬幣來解決這個問題，使得五分錢硬幣和一分錢硬幣最終相互交錯成一列。在每次移動過程中，玩家允許將 k 個相鄰的硬幣滑動到新位置，而在過程中這 k 個硬幣，不可以透過旋轉來調動原本相對應的順序。我們的方法使用搜尋來評估能夠移動的合法位置，並通過深度神經網路 (deep neural network) 決定要移動的位置，以幫助代理人 (agent) 在合理的時間內找到解答。代理人 (agent) 致力於找到最佳解答。基於這種方法，我們可以進行更大空間的搜尋。它為使用深度強化學習 (Deep Reinforcement Learning) 方法來解決組合最佳化 (combinatorial optimization) 的相關問題提供了一些啟示。

關鍵字

強化學習；硬幣移動問題

並列摘要

Reinforcement Learning is a type of Machine Learning method, whose concept is similar to how a human learns to perform a new task. It allows software agent to take actions with specific reward feedback in an environment. To maximize reward, the agent needs to learn which suitable action should be taken in a particular situation. AlphaGo is an artificial intelligence program that plays the board game Go. Since AlphaGo defeated a human professional player, it has become the milestone in artificial intelligence. The core technique of AlphaGo is Deep Reinforcement Learning, which combines Deep Learning and Reinforcement Learning. This field of research has received a fruitful success in both theory and application. In this thesis, we design a Deep Reinforcement Learning approach to solve the slidingcoin puzzle. The puzzle is defined as a line of n nickels and n pennies with all nickels arranged to the left of all pennies, where n ≥ 3. Player tries to solve the puzzle by rearranging nickels and pennies alternate in the line. In each move, the player is allowed to slide k adjacent coins to new positions without rotating. Our approach uses a search technique to evaluate the positions which are legal to move, and decides which positions to move by deep neural networks to help the agent find a solution in a reasonable amount of time. The agent is in pursuit of finding an optimal solution. Based on this approach, we can search for a larger space. It sheds some light on solving the combinatorial optimization problems with Deep Reinforcement Learning.

並列關鍵字

Deep Reinforcement Learning ； One-Dimensional Sliding-Coin Puzzle

參考文獻

[1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche,

Google Scholar

J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering

Google Scholar

the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587,

Google Scholar

p. 484, 2016.

Google Scholar

[2] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. R. Baker, M. Lai, A. Bolton, Y. Chen, T. P. Lillicrap, F. F. C. Hui, L. Sifre,

Google Scholar

國際替代計量

應用強化學習於硬幣移動問題

全文下載

主題瀏覽