Reinforcement learning provides a method for learning in games. In this code (Main.java) a simple maze game is implemented in Java (for C++ code see main.cpp). The code, once compiled, takes two command line arguments, one for the number of training games and one for the number of test games.
Currently the code shows the maze to the keyboard and requests that a user (you) supply an action to choose. The Maze shown has Xs for obstacles, a P for the player, an O for the opponent who is chasing you and a G for the goal. Actions are 0 (no action), 1 (left), 2 (right), 3 (up), 4 (down).
Once you understand the game you should replace the code as indicated in the comments of the code with a set of code to learn a solution via SARSA learning.
Your state should incorporate the location of the player and the opponent. You should represent your Q function as a table with the set of possible states for each possible action.
Once you have your code learning you should try to generate a learning curve. To create a learning curve train your system with a range of possible training games (for example on 500, 1000, 1500, 2000, 2500 games) and then test each of the resulting learners on a number of test games (for example, 500 games). Then plot the average performance by the number of training games.
You should hand in a documented copy of your code. You must also submit your code electronically. To do this create a tar file of all of your code and then submit it to the class webdrop by going to https://webdrop.d.umn.edu/ and picking the webdrop for 8751 after logging in.
Make sure to provide a good general description of your code. In addition hand in output and a writeup for all of your testing. Try to present the testing in a way to show how effective your system is.