蜜月橋牌程式叫牌與換牌階段的策略改進

不完全資訊賽局在當前的研究中仍存在許多尚須攻克的難點，其中大量存在的可能性狀態就是一個需要克服的難關。本研究希望透過對蜜月橋牌這項遊戲的研究來加深對不完全資訊賽局的了解並找到一些方法來處理爆炸性增長的狀態的問題。蜜月橋牌是一種三階段的遊戲，在每個階段中遊戲性質都會發生變化。本研究透過蜜月橋牌特性，成功完成及時分析換牌階段單一層的殘局庫全搜索，並撰寫了全新的蜜月橋牌程式，採用了bitboard的形式來實現，這大幅提升了程式的效能，並將程式讀取殘局庫的效能提升至每秒三千萬次的搜索速度。本研究利用打牌階段的資訊來代替使用人類經驗所建立牌力表，並使用取樣搜索的方式來判斷可執行行為的好壞，以此方法來使程式操作在打牌階段脫離人類經驗，這使得程式可以做到人類經驗以外的好步，大大提升了程式在換牌階段的能力。在經過調整叫牌階段策略與換牌階段策略後蜜月橋牌程式整體的對戰能力已經有著不錯的提升，在對戰人類玩家時有著不錯的勝率，並對戰先前的程式中也能保持超過六成的勝率。

關鍵字

電腦對局；位元棋盤；殘局庫；蜜月橋牌；不完全資訊賽局

並列摘要

There are still many difficulties to be overcome in the current research on games with incomplete information, among which the existence of a large number of possible states is a difficulty that needs to be overcome. This study hopes to deepen the understanding of incomplete information games and find some ways to deal with the problem of explosive growth by studying the game of Honeymoon Bridge. Honeymoon Bridge is a three-stage game where the nature of the game changes during each stage. Through the characteristics of Honeymoon Bridge, this study successfully completed the timely analysis of the full search of the single-level endgame database in the exchanging stage. We wrote a new Honeymoon Bridge program using the form of bitboard which greatly improved the performance of the program and made the program more efficient. The performance of reading the endgame database has been increased to a speed of 30 million searches per second. This study uses the information of the playing stage to replace the card score table established by human experience and uses the method of sampling search to judge the quality of the executable behavior. This method makes the program's operation separate from human experience in the playing stage, and allows the program to do better than human experience and greatly improves the program's ability in the exchanging stage. After implementing the strategies of the bidding and the exchanging stages, the overall strength of the Honeymoon Bridge program has been greatly improved. It has a good win rate against human players and can also get more than 60% win rate against the previous programs.

並列關鍵字

Computer games ； Bitboard ； Endgame database ； Honeymoon Bridge ； Incomplete information games

參考文獻

[1] Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), pp.484-489.

Google Scholar

[2] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. and Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), pp.354-359.

Google Scholar

[3] Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. and Hassabis, D. (2018). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. [online] Arxiv.org. https://arxiv.org/abs/1712.01815.

Google Scholar

[4] Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T., and Silver, D. Mastering atari, go, chess and shogi by planning with a learned model. 2020. URL http://arxiv.org/ abs/1911.08265.

Google Scholar

[5] Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S. and Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp.1-43.

Google Scholar

主題瀏覽