Learning Strategies for the Board Game Risk

Christian Pitera

Introduction:

For this project, I plan to create an application to calculate heuristics for game states in the game of Risk. The obvious end goal for this project is an application that can effectively learn strategies for the game. Very little work has gone into artificial intelligence for the game, and the work that has been done has been with human made strategies [1], or human/machine learned hybrids [2]. No work has gone into allowing the computer to learn heuristics and moves on its own.

Methods:

This project requires a form of reinforcement learning. As there is no way to adequately measure the utility of every move taken directly in the game, this project will have to make use of temporal difference learning in order to effectively estimate the utility of each board state [3]. Another problem is that the game state of the game of Risk is intractably large. Since I am dealing with a game that has a theoretically infinite number of states, evaluating based on the state as a whole is worthless. Instead, I'll split each state into many different parameters, and a function will be discovered that maps the parameters to the estimated value received from the temporal difference method. Because the game is so complex, and each parameter interacts with each other, normal regression methods are not very useful. Instead, a neural network will be used to attempt to discover the connections between parameters and the function to map them to the utility of the game state. The choice of moves at every possible moment will be based on a greedy algorithm, or a one ply search.

Data Gathering:

The data for this project will be generated through self-play. However, since a computer randomly playing itself could lead to games that never seemingly end, the first few games will be played by a human against the computer. It should be possible to get tens of thousands of games in after the computer learns even a basic strategy. The next question is which states should be used as data for the neural network. Using every state that is generated after every choice will not be effective. Since the game state changes very little after putting a single army down, for example, there will be many duplicate states being evaluated, leading to a bias towards longer turns. The answer to this is to only save states at the end of a turn. Each player in the game will have their turns saved separately, though they will be analyzed all together. I will also toy with various standardization models for the data, to make sure that the neural network can converge.

Milestone:

The main goal for the milestone is a program that can beat a human at the game. Since the initial training is going to be based on a game against a human, if the computer can beat a human by the milestone, it is proof that the computer has learned something, and that this method of teaching strategies to a computer is viable. If this goal is reached by the milestone, I'll see how much further we can take the program.

References:

1. http://www.unimaas.nl/games/files/bsc/Hahn_Bsc-paper.pdf

2. http://www.cs.cornell.edu/boom/2001sp/choi/473repo.html

3. http://www.scholarpedia.org/article/Temporal_Difference_Learning