CS 5761 - Introduction to Natural Language Processing 
 Project Beta version due by 5pm Thursday, April 29. Submit a tar  
file with your system, and a pdf/doc file with your  updated  
proposal/report via webdrop. Demo in lab at 6pm that same day. 
 Objectives 
To produce a beta version of your project, where you have implemented 
the majority of all of your system's functionality, and have updated your 
proposal such that it will be suitable as the core of your final report.
 System Requirements 
You should turn in a tar file that unpacks a directory named with your 
user-id, and the course number. For example, in my case this would be 
tpederse-5761. Your tar file should include all of your system code and 
data necessary for running your system and for evaluation. 
 
Make sure you provide the following in your tar file: 
 
-  A README that describes all the files you a are providing, as well  
as instructions as to how to  use. Make sure that your system is easy to 
install, and does not have hard coded path names, etc. that will prevent 
it from unpacking and running. If you have used any additional modules 
from CPAN or other sources, make sure you provide instructions about how 
to download and install those as well. 
-  Your system should include the majority of it's functionality. How 
that is organized and implemented is up to you, but it should be fairly 
easy for me or another "untrained" user to install and run. Make sure to 
test your system on the csdev platform before you submit, as this is where 
we will be running/testing them. 
-  An evaluation program that will allow you to score the results of 
your system. In other words, in addition to having a program that 
creates Google Sets or analyzes Voynich text, you should also have a 
program (and associated data) that can be used to analyze your system's 
output. 
-  You may want to use driver scripts written in Perl or a shell  
scripting language that ties all the pieces together. 
A useful hint: Have a friend do a test installation of your code to make 
sure it can be easily installed and run.  You will be surprised how many  
things  you take for granted in using your own code that are not apparent 
to  someone else.
 
Anyone who uses your tar file should be able to unpack the code, 
look at the README, and then run the the system itself within just a 
few minutes. There should be very few demands placed on the user in order  
to figure out how to run your project code. Part of the grade of your 
final system submission will be determined based on whether or not we can  
install and run and evaluate  your system quickly. Again, make sure to 
test on a csdev machine as this  is the platform we'll use for testing.
For the Voynich projects, make sure you provide the transcribed version of 
the manuscript as a part of your project tar file. For the Google Sets 
project, you may assume that I have WordNet already available, so you 
don't need to provide that. 
 
 Proposal/Report Requirements 
This version of your proposal (now morphing into a final report) should 
contain all of the changes that I have mentioned in my comments of April 9 
to the class, as well as any comments that have been made on either your 
initial proposal or the alpha version. You should pay particular attention 
to providing full details of your system's approach, as well as a detailed 
description of how you will do evaluation. 
Finally, remember that you want your final report to be a document that 
someone who was not a part of this class could read and understand. So 
please make sure you provide sufficient background regarding your 
problem, and explain what you have done clearly and without making any 
assumptions that the reader will be familiar with either Google Sets or 
the Voynich Manuscript.