Reinforcement learning provides a method for learning in games. In this program you will implement a simple reinforcement learning mechanism to learn in a simple maze game involving a starting point, a goal, obstacles, and an opponent that is trying to catch you.
In this code (chase_maze.zip) a simple maze game is implemented in Java. The code, once compiled, takes two command line arguments, one for the number of training games and one for the number of test games.
Currently the code shows the maze to the keyboard and requests that a user (you) supply an action to choose. The Maze shown has Xs for obstacles, a P for the player, an O for the opponent who is chasing you and a G for the goal. Actions are 0 (no action), 1 (left), 2 (right), 3 (up), 4 (down).
Once you understand the game you should replace the code as indicated in the comments of the code with a set of code to learn a solution via Q learning.
Your state should incorporate the location of the player and the opponent. You should represent your Q function as a table with the set of possible states for each possible action.
Once you have your code learning you should try to generate a learning curve. To create a learning curve train your system with a range of possible training games (for example on 500, 1000, 1500, 2000, 2500 games) and then test each of the resulting learners on a number of test games (for example, 500 games). Then plot the average performance by the number of training games. You should repeat this at least five times to get averages for each point.
Turn in a commented copy of your code and a summary of the results you achieved playing the computer and a discussion of your results.