In this lab you will be implementing the ID3 decision tree algorithm (see the text book and class notes for more details on this algorithm).
A sample run of your code on a data set should produce something like the following:
Tree for output class Outlook =Rain Wind =Weak Class=Yes =Strong Class=No =Overcast Class=Yes =Sunny Humidity =Normal Class=Yes =High Class=No
You should set up your code so that it is able to save the resulting decision tree in a file. You should then be able to use the resulting decision tree to predict the class for a separate set of data (a test file) and produce a confusion matrix for that data. Your output should look something like this:
Confusion matrix for tree from file XXX on data in file XXX: Actual Class 0 1 -------- Predicted 0 | 5 1 Class 1 | 0 10 Accuracy = 93.75%
In this program you may assume that there are no unknown data values but you should allow for the possibility of continuous features (use the approach discussed in the book for selecting split points for such features).
Print out a copy of all of your code files. You should hand in printouts demonstrating how your program works by running your program on several data sets, including your own. For the tree produced from your data set try to analyze the resulting tree and determine how accurate you think the tree is at capturing the concept expressed by your data.
You should also write up a short report (at least one page, no more than three) discussing your design decisions in implementing the ID3 code and how your version of the code works.
You must also submit your code electronically. To do this go to the link https://webapps.d.umn.edu/service/webdrop/rmaclin/cs8751-1-s2003/upload.cgi and follow the directions for uploading a file (you can do this multiple times, though it would be helpful if you would tar your files and upload one file archive).
To make your code easier to check and grade please use the following procedure for collecting the code before uploading it:
rmaclin/prog03_ccNote that the suffix of all C++ code files (not .h files) should be ".cc". Only code files (for example, in C++, only .cc and .h files) should be stored in this directory.
tar cf prog03.tar login/prog03_PLcode