Reinforcement Learning in Tic-Tac-Toe Game Group Members: Peng Ding and Tao Mao
|
1. What Is Tic-Tac-Toe? Tic-tac-toe is traditionally a popular game among kids: in its 3 by 3 board two persons alternately place one piece at a time; one wins when he or she has three pieces of his or her own in a row, whether horizontally, vertically, or diagonally.
2. Goals of the Project We will apply reinforcement learning, specifically temporal-difference learning, to evaluate each state's numerical value, based on which an agent chooses its next move. Clearly speaking, the "state" here is an "afterstate", defined as a state right after the agent's move. 3. Methods 3.1 Representation of State Space Since, in this representation, there are many impossible states as shown in Figure 2, we can apply harsh table to memorize feasible states instead of keeping a very large array which has lots of unused entries.
3.2 Introduction to Reinforcement Learning [1] 3.3 Temporal-Difference (TD) Method We will use temporal-difference method, one of reinforcement learning techniques, to approximate state values by updating values of visited states after each training game. where s is the current state, s' is the next state and V(s) is a state value for state s and 4. Datasets For simple games such as Tic-tac-toe, two compute agents will play against each other and learn game strategies from simulated games. This training method is called self-play, which has several advantages such as that an agent has general strategies rather than those associated with a fixed opponent. However, it would have a slow convergence rate, especially in the early learning stage [2]. Thus, we will also consider using human-computer games, or games of our computer agent vs. other existing agents to obtain sophisticated game strategies if training results of the above datasets are not robust enough. 5.Timeline (1) Game framework including human-computer interface and basic machine learning core;
[1] R. Sutton and A. Barto. Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA: pp. 10-15, 156. 1998 |