透過您的圖書館登入
IP:3.141.41.187
  • 學位論文

利用他家資訊模組來改良麻將程式

Using Other Players' Information Models to Improve Mahjong Program

指導教授 : 林順喜

摘要


近年來,隨著設備及技術的發展,在完全資訊的眾多遊戲上,電腦已經超越人類。不同於圍棋、黑白棋、將棋,麻將是一個多人、機率性、不完全資訊的遊戲,還有更多的發展課題。這篇論文將會簡單介紹麻將的玩法,並簡述過去的麻將程式相關技術。 本論文採用部分Let’s Play Mahjong!論文的方法,將說明是如何實作出判斷手牌狀態、如何判斷鳴牌的抉擇,並以此來加強麻將程式。 本篇論文在進攻的選擇上,採用注重機率的棄牌選擇。在防守上則將他人的棄牌以及整個盤面紀錄,以此分析並將其資訊化成數值。最後將防守和進攻的選擇一同判斷後,選擇出最佳的棄牌。 在防守策略上並非純以統計或模擬的方式,而是根據玩家在棄牌時,會以最佳化自己的手牌做處理為原則,並依照在其他論文看到的理論進行推導,做出一個防守模組。 解讀數據上,將實驗時的統計解讀方式做了修正,避免運氣上的成分超過實際的實力差。

並列摘要


In recent years, with the development of equipments and techniques, computers have beat top human players in many perfect information games. Rather than Go, Othello, and Shogi, Mahjong is a multi-player, probabilistic, and imperfect information game, which still has many development issues to be addressed. This thesis would simply introduce the game rules of Mahjong and talk about the programs that made before. We apply for partial ideas in the paper titled "Let's Play Mahjong!". We would explain how to judge the state of a hand, how to choose whether to chow a tile or not, and what ideas we get to improve our program. On offense, we use probability-oriented policy for tile selection. On defense. We observe the discarded tiles of other players to analyze their information. Then we take advantage of these statistics information as well as the offense policy to choose a best tile to discard. On the defense strategy, we don't simply choose a tile to discard by simulation or statistics. Instead, we defense by trying to discard a tile to optimize the player's hand. According to the concepts in other papers, we make a defense model. Sometimes, a Mahjong program would get lots of score which is not proportional to its strength. In this thesis, we focus on the rate that our program will win or lose in order to avoid the element of luck.

參考文獻


[1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. V. D. Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[2] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. V. D. Driessche, T. Graepel, and D. Hassabis, Mastering the game of Go without human knowledge, Nature, vol. 550, no. 7676, pp. 354–359, 2017.
[3] Surag Nair, Alpha Zero General, https://github.com/suragnair/alpha-zero-general
[4] Noam Brown, Tuomas Sandholm, Superhuman AI for Heads-up No-limit Poker: Libratus Beats Top Professionals, Science, Vol.359, pp.418-424, 2018.
[5] Junjie Li, Sotetsu Koyamada, Suphx: Mastering Mahjong with Deep Reinforcement Learning, arXiv:2003.13590, 2020.

延伸閱讀