CS 5541
Program 4 - Reinforcement Learning in a Maze
Due December 16, 2010 (No LATE Programs!)
60 points


Reinforcement learning provides a method for learning in games. In this program you will implement a simple reinforcement learning mechanism to learn in a simple maze game involving a starting point, a goal, obstacles, and an opponent that is trying to catch you.

Some Implementation Details

In this code (prog4.lisp) a simple maze game is implemented in lisp. The code implements two routines of interest:

To see how the game works you can load the initial code (load "prog4.lisp") and then type (play-games 5 T nil) to play 5 games interactively. the code shows the maze to the keyboard and requests that a user (you) supply an action to choose. The Maze shown has Xs for obstacles, a p for the player, an O for the opponent who is chasing you and a G for the goal. Actions are 1 (left), 2 (up), 3 (right), 4 (down). The code as written queries the user about the move to make. You will replace this code with code to have a computer learner play the game. The relevant routines you need to implement are:

You will need to create a description of the board (the state of the board). Since the obstacles are all fixed and the goal is always in the same location, the only thing that changes about the board is the position of the player and the opponent. You will need to come up with a mechanism to assign a unique state number to each possible combination of positions of the player and opponent (I suggest you write a function to do this).

You will then need to create a Q table and a Visits table. The former will need to have a row for each possible state number and then 4 columns for each action. The initial values of this table should all be 0.0. The Visits table will count how many times you have tried each state/action combination. This will also have a row for each possible state number and 4 columns for each action. The initial values of this table should be 0.


Once you have your code learning you should use the code run-experiment to generate five learning curves (call it five times). This code trains your model using 100 games and then plays 100 games with learning turned off, then 100 more games (for a total of 200 training games), then plays 100 games without learning, then 100 more, etc. This code should produce something like this:

Training up to 100
Training up to 200
Training up to 300
Training up to 400
Training up to 500
Training up to 600
Training up to 700
Training up to 800
Training up to 900
Training up to 1000
Training up to 1250
Training up to 1500
Training up to 2000
Training up to 2500
Training up to 3000
 100:   21 Wins,  78 Losses,   1 Draws
 200:   36 Wins,  42 Losses,  22 Draws
 300:   40 Wins,  25 Losses,  35 Draws
 400:   41 Wins,  22 Losses,  37 Draws
 500:   54 Wins,  22 Losses,  24 Draws
 600:   55 Wins,  21 Losses,  24 Draws
 700:   63 Wins,   6 Losses,  31 Draws
 800:   68 Wins,   7 Losses,  25 Draws
 900:   56 Wins,   6 Losses,  38 Draws
1000:   62 Wins,   8 Losses,  30 Draws
1250:   63 Wins,  12 Losses,  25 Draws
1500:   72 Wins,   5 Losses,  23 Draws
2000:   77 Wins,   2 Losses,  21 Draws
2500:   73 Wins,   0 Losses,  27 Draws
3000:   72 Wins,   3 Losses,  25 Draws

The first part (e.g., Training up to 100, etc. simply shows the progress of the training), the second part shows how many wins/losses/draws the player gets after that amount of training (so 200: 36 Wins, 42 Losses, 22 Draws indicates that after 200 training games the computer player wins 36 games, loses 42 and has 22 draws when testing). Average these results for your five runs and present the result as a graph with the x axis being the number of training games and the y axis showing the average number of wins, losses and draws (three separate lines). Discuss your results in the material you hand in.

What To Hand In

Turn in a commented copy of your code, a printout of the five runs of run-experiment you run and the graph of your results along with a discussion of these results.