CS 8761 Natural Language Processing - Fall 2002

Assignment 4 - Due Monday Nov 11 noon

This may be revised in response to your questions. Last update Mon Nov 4 5:00 pm

Objectives

To continue your exploration of supervised approaches to word sense disambiguation.

Specification

Fix your Naive Bayesian Classifier from Assignment 3. If you got less than a 10 on this assignment you have a problem that requires fixing. Once you have fixed your classifier, rerun the line data as described for Assignment 3 and rewrite your report reflecting your new results. Once you have fixed your classifier you are free to go on to the Open Mind exercise described below. If your classifier is fixed and you follow the assignment below you can receive up to 2 points of extra credit.

Even if you received full credit (10 of 10 points) on Assignment 3, I may still have suggested that you try and fix a few minor points. However, do not spend a great deal of time on that. Basically your classifier was fine. You should focus on the following:

Identify three "good" words in the Open Mind data that we created as a class. (This is found in file cs8761-umd.full.details in /home/cs/tpederse/CS8761/Open-Mind). If you would like to earn two points of extra credit you can do this for six words. A "good" word is one that has the following properties: One you have identified your words, convert the Open Mind data into the form of the line data so you can use your Assignment 3 classifier. Run your classifier on that data and comment on your results in a report called (as always) experiments.txt. Note that the criteria for "good" words includes quite a few subjective terms (reasonable, somewhat, etc.) You will need to comment on how you ended up viewing what is reasonable, etc. in your report.

Experiments and Report

These remain the same for your fixed classifier as in Assignment 3. For the Open Mind words, you can comment on the issues raised above. You have some latitude in the issues you raise in your report so please focus on those that seem most interesting for the words you are dealing with.

Your Open Mind tagging that was finished Nov 1 counts four points towards assignment 4. Thus, if you don't fix your classifier in any way you would still get a 4 of 10 if you did the tagging.

Submission Guidelines

Submit all of your program files and your report. Make sure to submit your target.config file as well. All should be plain text. Make sure you your name, date, and class information are contained in each file (except for target.config!), and that your source code files are carefully commented.

For the Open Mind words, make sure you submit all of the above, plus the data that you use in your experiments. This should be the data in the "line" format so that I can rerun your experiments without having to convert the data from Open Mind format. However, you should also provide the program/s that you use for converting the data. Please document how I could use those if I wished.

Place all of these files into a directory that is named with your umd user id. In my case the directory would be called tpederse, for example. Then create a tar file that includes this directory and the files you will submit. Compress that tar file and submit it via the web drop from the class home page. Please note that the deadline will be enforced by automatic means. Any submissions after the deadline will not be graded. The web drop has a limit of 10mb, so your files should be plain text.

This is an individual assignment. You must write *all* of your code on your own. Do not get code from your colleagues, the Internet, etc. Please do not discuss your interpretations of these results amongst yourselves. This is meant to make you think for yourself and arrive at your own conclusions.

by: Ted Pedersen - tpederse@umn.edu