CS 5761 - Introduction to Natural Language Processing

Project Alpha version due by 4pm Friday April 12. Email a url where we can find your tar file to patw0006 and tpederse.


To produce a preliminary version of your project, where you have implemented a simple baseline approach by which you can judge your progress later in the semester.


You should provide the following in your tar file. Have a friend do a test download of your code to make sure it is working. You will be surprised how many things you take for granted in using your own code that are not apparent to someone else.

Anyone who downloads your tar file should be able to unpack the code, look at the readme, and then run the driver script within just a few minutes. There should be very few demands placed on the user in order to figure out how to run your project code. If we can't run your code after at most 5 minutes of trying you won't get credit for this portion of the project. Assume that we are running on a csdev machine.

For part of speech tagging projects, assume that I have the penn treebank available. For authorship id projects assume that I have a few texts from Project Gutenberg available. These will be of my choosing. For other projects please provide your data in the tar file, unless that makes your download rather large. In that case contact me ahead of time and we'll arrange something.

The alpha version counts for 20% of the total project grade. Late submissions (after 4pm Friday April 12) will not be downloaded.

Policies (from syllabus)

All programming assignments and your project will be demonstrated during designated lab sessions. You should also submit an electronic copy of your source code to the TA prior to the designated demo session. (His email address is patw0006@d.umn.edu.) There is no other way to submit your programming assignments or project. Failure to submit AND demo on time will result in a zero.

Any code you submit should be commented. I must be able to understand what your code does simply by reading the comments. This understanding should extend down to the details of your code. So do not simply describe the input and output, also include comments that describe your particular algorithm and coding techniques. Failure to comment to this degree will result in a zero.

All assignments and the project are to be done individually. You are required to write your own code. Unless otherwise specified, you must only turn in code that you personally wrote. The only possible exception to this is if I tell you to use a module that is available in a book or online archive. However, I will clearly indicate when this is permissible. Violations of this policy will result in severe grading penalties and/or failure in the class.