+++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ CS8761: Assignment 4 - Report ----------------------------- Author: Amine Abou-Rjeili Date: 11/10/2002 Files included in tar file: --------------------------- select.pl - Splits the data into TRAIN and TEST set according to the given ratio feat.pl - Each line from the input file is split into 3 parts; key, sense, sentence, and the features extracted from the sentence part only. convert.pl - Converts a data file into feature representation nb.pl - Runs a Naive Bayesian Classifier on data. Changes madeas compared to previous version: 1) The probability of the sense that occurred the most times is displayed, in the case that there are no features and classification is based on the "most common sense" algorithm. 2) The probability is displayed using 15 decimal places instead of the mathematical 'e' format run-experiment - performs the experiments given a set of data. The data is split 70-30 TRAIN/TEST ratio and the experiments are carried out according to the constant and variable environments (see description below). The following files will be output: experiment.txt - Report of results FEAT - list of features TAGGING.0-1 - tagging detail for window 0 frequency cutoff 1 TAGGING.0-2 - tagging detail for window 0 frequency cutoff 2 TAGGING.0-5 - tagging detail for window 0 frequency cutoff 5 TAGGING.10-1 - tagging detail for window 10 frequency cutoff 1 TAGGING.10-2 - tagging detail for window 10 frequency cutoff 2 TAGGING.10-5 - tagging detail for window 10 frequency cutoff 5 TAGGING.2-1 - tagging detail for window 2 frequency cutoff 1 TAGGING.2-2 - tagging detail for window 2 frequency cutoff 2 TAGGING.2-5 - tagging detail for window 2 frequency cutoff 5 TAGGING.25-1 - tagging detail for window 25 frequency cutoff 1 TAGGING.25-2 - tagging detail for window 25 frequency cutoff 2 TAGGING.25-5 - tagging detail for window 25 frequency cutoff 5 target.config - config file TEST - test examples TEST.FV - test examples in feature format TRAIN - train examples TRAIN.FV - train examples in feature format Files added to get statistics and convert the Open Mind data to data appropriate to the classifier: count_words.pl - Counts the total number of words that have been tagged from the filenames given as command line arguments. This script is used to count how many words were tagged from the entire cs8761-umd.full.detailed file. find_good_words.pl - Calculates statistics from the data from the file as given on the command line. This is the cs8761-umd.full.detailed file. The statistics calculated are the criteria that have been specified in the assignment that make a "good" word. In addition, the script will filter out any duplicate and bad examples and split the input file into multiple files with each corresponding to a particular word. Each 'word' file will contain all examples for that particular word. Each example will occur only once. For example, if we assume that the cs8761-umd.full.detailed file contains examples for the words 'edge' 'energy' 'hope', then this script will create 3 files in the current directory called 'edge' 'energy' 'hope' that will correspond to each word from the main file. Each file will contain the examples from the detailed file except for the ones that have been filtered out. Examples will be filtered out according to the following rules: 1) an example has been tagged more than once and none of the tagged examples agree at least 2 times. 2) an example has been tagged more than 2 times and more than 1 sense has an agreement rate of 2 or more, then the sense with the highest agreement rate will be taken. The others will be dropped. 3) in case there is a tie with more than 2 senses in terms of agreement rate, the last sense will be picked and the rest dropped The following rule will create a file where each example ID occurs at most one time convert_data.pl - Converts the files as outputted from the above script into the format that the classifier understands. This is the same format as the line-data. However, the input format must be the same as outputted from the script find_good_words.pl For more information about the included scripts see the documentation provided at the beginning of the script. EXAMPLE RUN USING PROVIDED SCRIPTS: ----------------------------------- shell> find_good_words.pl /home/cs/tpederse/CS8761/Open-Mind/cs8761-umd.full.detailed > GOOD_WORDS2 Select words from the list of possible good words and run as follows e.g. aspect: shell> convert_data.pl aspect aspect /home/cs/tpederse/CS8761/Open-Mind/ids-to-sentences Now we will have a list of files corresponding to the senses of word 'aspect' in the appropriate format Optional: --------- shell> run-experiment all_senses_file This will run all the experiments on the provided senses file. EXPERIMENTS AND ANALYSIS: ------------------------- The task for this assignment is to use the Naive Bayesian classifier implemented in assignment 3 and run it using 3 "good" words from the data for the class obtained from the Open Mind project. A good word is one defined as follows (as taken from Assignment 4 specifications) * Has a reasonable rate of agreement among the two taggers. * Has a somewhat balanced distribution of senses. At the very least avoids the case where a single sense dominates the distribution. * Has a reasonable number of examples. To help in identifying "good" words, I created the script find_good_words.pl to print out some statistics and decompose the the examples from the main Open Mind data file into separate files for each word (see description above for more information) An output extract from a run of the script on the cs8761-umd.full.detailed file is as follows: 'Word' 'Agreement Count' 'Example Count' 'Number of senses' arc 64 3 - 1:19:00:: 1:25:00:: 1:25:01:: Sense Frequency 1:19:00:: = 7 1:25:00:: = 9 1:25:01:: = 48 argument 94 3 - 1:10:00:: 1:10:02:: 1:10:03:: Sense Frequency 1:10:00:: = 39 1:10:02:: = 32 1:10:03:: = 23 art 110 4 - 1:06:00:: 1:10:00:: 1:04:00:: 1:09:00:: Sense Frequency 1:06:00:: = 31 1:10:00:: = 19 1:04:00:: = 35 1:09:00:: = 25 aspect 100 5 - 1:24:00:: 1:07:02:: 1:09:01:: 1:09:00:: 1:07:01:: Sense Frequency 1:24:00:: = 3 1:07:02:: = 23 1:09:01:: = 15 1:09:00:: = 55 1:07:01:: = 4 The output has the following format: * The first line of each word is as follows: word total_number_of_occurences total_number_of_sense - list_of_senses The tab character (\t) is used as the field delimiter * The second line contains a list of the senses with a count of how many times that sense was encountered in the examples file Each sense is displayed on its own line and is indented with a tab character (\t) at the beginning of the line. These statistics show the sense distribution for a word together with the total number of available examples according to the filtering rules as specified above. However, these filtering rules introduce the following pitfall: * In the case that each example has been tagged ONLY ONCE then all the examples will be included, since there is nothing to contradict a particular example, so it is assumed to be correct. This is the case of the word 'art' from the cs8761-umd.full.detailed data. Therefor, in addition to looking at these statistics it is recommended to also check the actual data file. These statistics are meant to help in the process of identifying good words and not meant to actually do the identification. As mentioned above, this script will also produce a number of files for each word encountered in the main data file. After the find_good_words.pl script has been run, the output was examined to identify six good words. These are as follows (the statistics are also shown): * argument 94 3 - 1:10:00:: 1:10:02:: 1:10:03:: Sense Frequency 1:10:00:: = 39 1:10:02:: = 32 1:10:03:: = 23 * aspect 100 5 - 1:24:00:: 1:07:02:: 1:09:01:: 1:09:00:: 1:07:01:: Sense Frequency 1:24:00:: = 3 1:07:02:: = 23 1:09:01:: = 15 1:09:00:: = 55 1:07:01:: = 4 * edge 106 6 - 1:15:00:: 1:07:00:: 1:06:01:: 1:06:00:: 1:25:00:: 1:07:01:: Sense Frequency 1:15:00:: = 19 1:07:00:: = 38 1:06:01:: = 6 1:06:00:: = 15 1:25:00:: = 15 1:07:01:: = 13 * energy 148 6 - 1:14:00:: 1:07:00:: 1:19:00:: 1:26:00:: 1:07:02:: 1:07:01:: Sense Frequency 1:14:00:: = 6 1:07:00:: = 51 1:19:00:: = 67 1:26:00:: = 12 1:07:02:: = 6 1:07:01:: = 6 * hope 111 5 - 1:07:00:: 1:18:00:: 1:12:00:: 1:09:00:: 1:12:01:: Sense Frequency 1:07:00:: = 1 1:18:00:: = 9 1:12:00:: = 29 1:09:00:: = 44 1:12:01:: = 28 * length 111 5 - 1:06:00:: 1:07:00:: 1:07:02:: 1:07:03:: 1:07:01:: Sense Frequency 1:06:00:: = 6 1:07:00:: = 27 1:07:02:: = 23 1:07:03:: = 23 1:07:01:: = 32 I chose these six words because they have a good number of senses (more than 2) and an acceptable balanced distribution, so there is no sense that totally dominates the distribution. For the experiments, I used the 70-30 ratio as in assignment 3. I tried other ratios and the results were approximately the same with some ratios having worse results, so I decided to adopt his ratio scheme. The reasoning behind trying different ratios is that in all these examples, the number of examples we have to train and test is relatively small as compared to the line-data examples. So I tried of using a higher ratio for training such as 75-25 80-20 90-10 but the results did not show any improvement and in the case of the 90-10 they were actually worse. Therefor the same window size and frequency cutoff as in assignment 3 were used for the experiments here. The following results were obtained for each word: argument-data/experiment.txt ----------------------------- window size|frequency cutoff|accuracy 0 1 Accuracy: 0.3929 [Total number of correct 11 out of 28] 0 2 Accuracy: 0.3929 [Total number of correct 11 out of 28] 0 5 Accuracy: 0.3929 [Total number of correct 11 out of 28] 2 1 Accuracy: 0.5357 [Total number of correct 15 out of 28] 2 2 Accuracy: 0.4643 [Total number of correct 13 out of 28] 2 5 Accuracy: 0.5357 [Total number of correct 15 out of 28] 10 1 Accuracy: 0.4286 [Total number of correct 12 out of 28] 10 2 Accuracy: 0.5000 [Total number of correct 14 out of 28] 10 5 Accuracy: 0.3929 [Total number of correct 11 out of 28] 25 1 Accuracy: 0.4643 [Total number of correct 13 out of 28] 25 2 Accuracy: 0.5000 [Total number of correct 14 out of 28] 25 5 Accuracy: 0.3571 [Total number of correct 10 out of 28] --------------------------------------------------------- Running experiments with different TRAIN and TEST each time window size|frequency cutoff|accuracy 0 1 Accuracy: 0.2500 [Total number of correct 7 out of 28] 0 2 Accuracy: 0.3571 [Total number of correct 10 out of 28] 0 5 Accuracy: 0.5000 [Total number of correct 14 out of 28] 2 1 Accuracy: 0.5714 [Total number of correct 16 out of 28] 2 2 Accuracy: 0.6429 [Total number of correct 18 out of 28] 2 5 Accuracy: 0.5357 [Total number of correct 15 out of 28] 10 1 Accuracy: 0.3571 [Total number of correct 10 out of 28] 10 2 Accuracy: 0.5714 [Total number of correct 16 out of 28] 10 5 Accuracy: 0.4643 [Total number of correct 13 out of 28] 25 1 Accuracy: 0.3571 [Total number of correct 10 out of 28] 25 2 Accuracy: 0.6429 [Total number of correct 18 out of 28] 25 5 Accuracy: 0.5714 [Total number of correct 16 out of 28] aspect-data/experiment.txt -------------------------- window size|frequency cutoff|accuracy 0 1 Accuracy: 0.4667 [Total number of correct 14 out of 30] 0 2 Accuracy: 0.4667 [Total number of correct 14 out of 30] 0 5 Accuracy: 0.4667 [Total number of correct 14 out of 30] 2 1 Accuracy: 0.4333 [Total number of correct 13 out of 30] 2 2 Accuracy: 0.3333 [Total number of correct 10 out of 30] 2 5 Accuracy: 0.4000 [Total number of correct 12 out of 30] 10 1 Accuracy: 0.4333 [Total number of correct 13 out of 30] 10 2 Accuracy: 0.3333 [Total number of correct 10 out of 30] 10 5 Accuracy: 0.3000 [Total number of correct 9 out of 30] 25 1 Accuracy: 0.4333 [Total number of correct 13 out of 30] 25 2 Accuracy: 0.4000 [Total number of correct 12 out of 30] 25 5 Accuracy: 0.4333 [Total number of correct 13 out of 30] --------------------------------------------------------- Running experiments with different TRAIN and TEST each time window size|frequency cutoff|accuracy 0 1 Accuracy: 0.6667 [Total number of correct 20 out of 30] 0 2 Accuracy: 0.5333 [Total number of correct 16 out of 30] 0 5 Accuracy: 0.5333 [Total number of correct 16 out of 30] 2 1 Accuracy: 0.4000 [Total number of correct 12 out of 30] 2 2 Accuracy: 0.5000 [Total number of correct 15 out of 30] 2 5 Accuracy: 0.4000 [Total number of correct 12 out of 30] 10 1 Accuracy: 0.5000 [Total number of correct 15 out of 30] 10 2 Accuracy: 0.3667 [Total number of correct 11 out of 30] 10 5 Accuracy: 0.3333 [Total number of correct 10 out of 30] 25 1 Accuracy: 0.4333 [Total number of correct 13 out of 30] 25 2 Accuracy: 0.5000 [Total number of correct 15 out of 30] 25 5 Accuracy: 0.3333 [Total number of correct 10 out of 30] edge-data/experiment.txt ------------------------ window size|frequency cutoff|accuracy 0 1 Accuracy: 0.3871 [Total number of correct 12 out of 31] 0 2 Accuracy: 0.3871 [Total number of correct 12 out of 31] 0 5 Accuracy: 0.3871 [Total number of correct 12 out of 31] 2 1 Accuracy: 0.2903 [Total number of correct 9 out of 31] 2 2 Accuracy: 0.2903 [Total number of correct 9 out of 31] 2 5 Accuracy: 0.3226 [Total number of correct 10 out of 31] 10 1 Accuracy: 0.4516 [Total number of correct 14 out of 31] 10 2 Accuracy: 0.3871 [Total number of correct 12 out of 31] 10 5 Accuracy: 0.3226 [Total number of correct 10 out of 31] 25 1 Accuracy: 0.3871 [Total number of correct 12 out of 31] 25 2 Accuracy: 0.2581 [Total number of correct 8 out of 31] 25 5 Accuracy: 0.2903 [Total number of correct 9 out of 31] --------------------------------------------------------- Running experiments with different TRAIN and TEST each time window size|frequency cutoff|accuracy 0 1 Accuracy: 0.3226 [Total number of correct 10 out of 31] 0 2 Accuracy: 0.3548 [Total number of correct 11 out of 31] 0 5 Accuracy: 0.3548 [Total number of correct 11 out of 31] 2 1 Accuracy: 0.3548 [Total number of correct 11 out of 31] 2 2 Accuracy: 0.2903 [Total number of correct 9 out of 31] 2 5 Accuracy: 0.2903 [Total number of correct 9 out of 31] 10 1 Accuracy: 0.2903 [Total number of correct 9 out of 31] 10 2 Accuracy: 0.2581 [Total number of correct 8 out of 31] 10 5 Accuracy: 0.3548 [Total number of correct 11 out of 31] 25 1 Accuracy: 0.3871 [Total number of correct 12 out of 31] 25 2 Accuracy: 0.3548 [Total number of correct 11 out of 31] 25 5 Accuracy: 0.1613 [Total number of correct 5 out of 31] energy-data/experiment.txt -------------------------- window size|frequency cutoff|accuracy 0 1 Accuracy: 0.5455 [Total number of correct 24 out of 44] 0 2 Accuracy: 0.5455 [Total number of correct 24 out of 44] 0 5 Accuracy: 0.5455 [Total number of correct 24 out of 44] 2 1 Accuracy: 0.5000 [Total number of correct 22 out of 44] 2 2 Accuracy: 0.4773 [Total number of correct 21 out of 44] 2 5 Accuracy: 0.4091 [Total number of correct 18 out of 44] 10 1 Accuracy: 0.5909 [Total number of correct 26 out of 44] 10 2 Accuracy: 0.5909 [Total number of correct 26 out of 44] 10 5 Accuracy: 0.6364 [Total number of correct 28 out of 44] 25 1 Accuracy: 0.5227 [Total number of correct 23 out of 44] 25 2 Accuracy: 0.5909 [Total number of correct 26 out of 44] 25 5 Accuracy: 0.5455 [Total number of correct 24 out of 44] --------------------------------------------------------- Running experiments with different TRAIN and TEST each time window size|frequency cutoff|accuracy 0 1 Accuracy: 0.4545 [Total number of correct 20 out of 44] 0 2 Accuracy: 0.4773 [Total number of correct 21 out of 44] 0 5 Accuracy: 0.4773 [Total number of correct 21 out of 44] 2 1 Accuracy: 0.3636 [Total number of correct 16 out of 44] 2 2 Accuracy: 0.3864 [Total number of correct 17 out of 44] 2 5 Accuracy: 0.2955 [Total number of correct 13 out of 44] 10 1 Accuracy: 0.4091 [Total number of correct 18 out of 44] 10 2 Accuracy: 0.4318 [Total number of correct 19 out of 44] 10 5 Accuracy: 0.3864 [Total number of correct 17 out of 44] 25 1 Accuracy: 0.4318 [Total number of correct 19 out of 44] 25 2 Accuracy: 0.3864 [Total number of correct 17 out of 44] 25 5 Accuracy: 0.5000 [Total number of correct 22 out of 44] hope-data/experiment.txt ------------------------ window size|frequency cutoff|accuracy 0 1 Accuracy: 0.4545 [Total number of correct 15 out of 33] 0 2 Accuracy: 0.4545 [Total number of correct 15 out of 33] 0 5 Accuracy: 0.4545 [Total number of correct 15 out of 33] 2 1 Accuracy: 0.4545 [Total number of correct 15 out of 33] 2 2 Accuracy: 0.4242 [Total number of correct 14 out of 33] 2 5 Accuracy: 0.3939 [Total number of correct 13 out of 33] 10 1 Accuracy: 0.3636 [Total number of correct 12 out of 33] 10 2 Accuracy: 0.3939 [Total number of correct 13 out of 33] 10 5 Accuracy: 0.3939 [Total number of correct 13 out of 33] 25 1 Accuracy: 0.3636 [Total number of correct 12 out of 33] 25 2 Accuracy: 0.3636 [Total number of correct 12 out of 33] 25 5 Accuracy: 0.3333 [Total number of correct 11 out of 33] --------------------------------------------------------- Running experiments with different TRAIN and TEST each time window size|frequency cutoff|accuracy 0 1 Accuracy: 0.3636 [Total number of correct 12 out of 33] 0 2 Accuracy: 0.3636 [Total number of correct 12 out of 33] 0 5 Accuracy: 0.4242 [Total number of correct 14 out of 33] 2 1 Accuracy: 0.2424 [Total number of correct 8 out of 33] 2 2 Accuracy: 0.2424 [Total number of correct 8 out of 33] 2 5 Accuracy: 0.3030 [Total number of correct 10 out of 33] 10 1 Accuracy: 0.3636 [Total number of correct 12 out of 33] 10 2 Accuracy: 0.2727 [Total number of correct 9 out of 33] 10 5 Accuracy: 0.1515 [Total number of correct 5 out of 33] 25 1 Accuracy: 0.4242 [Total number of correct 14 out of 33] 25 2 Accuracy: 0.1818 [Total number of correct 6 out of 33] 25 5 Accuracy: 0.2727 [Total number of correct 9 out of 33] length-data/experiment.txt -------------------------- window size|frequency cutoff|accuracy 0 1 Accuracy: 0.2121 [Total number of correct 7 out of 33] 0 2 Accuracy: 0.2121 [Total number of correct 7 out of 33] 0 5 Accuracy: 0.2121 [Total number of correct 7 out of 33] 2 1 Accuracy: 0.3636 [Total number of correct 12 out of 33] 2 2 Accuracy: 0.3030 [Total number of correct 10 out of 33] 2 5 Accuracy: 0.2727 [Total number of correct 9 out of 33] 10 1 Accuracy: 0.3333 [Total number of correct 11 out of 33] 10 2 Accuracy: 0.4242 [Total number of correct 14 out of 33] 10 5 Accuracy: 0.3939 [Total number of correct 13 out of 33] 25 1 Accuracy: 0.3030 [Total number of correct 10 out of 33] 25 2 Accuracy: 0.4242 [Total number of correct 14 out of 33] 25 5 Accuracy: 0.3333 [Total number of correct 11 out of 33] --------------------------------------------------------- Running experiments with different TRAIN and TEST each time window size|frequency cutoff|accuracy 0 1 Accuracy: 0.2121 [Total number of correct 7 out of 33] 0 2 Accuracy: 0.2121 [Total number of correct 7 out of 33] 0 5 Accuracy: 0.2121 [Total number of correct 7 out of 33] 2 1 Accuracy: 0.3939 [Total number of correct 13 out of 33] 2 2 Accuracy: 0.2727 [Total number of correct 9 out of 33] 2 5 Accuracy: 0.3636 [Total number of correct 12 out of 33] 10 1 Accuracy: 0.3636 [Total number of correct 12 out of 33] 10 2 Accuracy: 0.4545 [Total number of correct 15 out of 33] 10 5 Accuracy: 0.3030 [Total number of correct 10 out of 33] 25 1 Accuracy: 0.2121 [Total number of correct 7 out of 33] 25 2 Accuracy: 0.2424 [Total number of correct 8 out of 33] 25 5 Accuracy: 0.3636 [Total number of correct 12 out of 33] Here I must note that the same 2 cases of experiments as in assignment 3 were used. For clarity, I will briefly describe both these experiments: 1) Split the data into TRAIN and TEST partitions once and then run the experiments using these partitions. This experiment was carried out to compare the performance of the different combinations using the same TRAIN and TEST data (constant environment). The results of these experiments are summarized in TABLE 1. 2) For each of the combination of window size and frequency cutoffs, a different set of TRAIN and TEST partitions was used. This means that these partitions differ in every run because they are randomly distributed from the entire line data. This experiment was carried to see how different partitions can affect the performance of each experiment. As can be observed from the data below, the performances of the 2 experiments are very similar and so the changing environments did not produce a great impact. However, in some cases performance was degraded slightly with different partitions (in experiment 2). Here, I must note that these 2 sets of experiments do not necessarily imply that having a constant environment is always better. It seems that it is a matter of finding the set of TRAINING data that is diverse enough that it reflects with most accuracy any set of test data. The above is extracted from the experiments.txt file from assignment 3. It was also planned to further split the training examples into subgroups, run the experiment on each group and then get the average accuracy over all the groups. However, after careful consideration, I decided not carry out this experiment because of the small number of examples provided and thus would not produce any significant results in the accuracy. As can be seen from the output of the experiments, the results look very poor compared to the "line" data. One reason for the poor performance is the lack of examples. Clearly, if the classifier is provided with about one thousand examples to train on ("line" data), it will perform better than when provided with about one hundred examples (this data). Also, it seemed that in some cases the performance actually dropped when a window has been introduced. One reason for this is that most of the features where not seen in the training examples. This is due, again, to the small number of examples provided. Another reason for the poor performance with this data, is that in some cases the features do not provide a clear-cut hint to as to what the sense is. The "line" data was comparatively straightforward in that the features identified the sense in a clear manner. I believe this data is more complex than the "line" data. Conclusion: ---------- This assignment, gave me some insight into the pitfalls of a Naive Bayesian Classifier. One such insight, is that a lot of data is required to train the classifier if it is to perform at an acceptable level. Also the features must provide some hint as to the classification and cannot be totally vague. On the other hand, the classifier seemed to handle noise very good. I noticed this when I run the classifier with all of the data for some of the words. This includes a lot of noise, in the form of having the same instance classified with more than one sense. With such data the accuracy dropped slightly as compared with the filtered data but it was comparable. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ Name : Nitin Agarwal Date 11-Nov-02 Class : Natural Language Processing CS8761 =============================================================== Objective To create sense tagged text and explore supervised approaches to the word sense disambiguation. Procedure Let us consider a huge corpus in which one word occurs in all the lines and could have various meanings. We have to split this corpus into 2 sets. We analyze one set using Statistical Natural Language Processing techniques to get the training data. Using this data we try to figure out the meaning of the target word in the test data. After this we compare the meaning of the word obtained using this method with the actual meaning and find the accuracy of this method. The value of the accuracy could be anywhere between 0 and 1. The closer the value is to 1 the better the method is. The above mentioned test has been done using the word "Line". The experiment has been performed in four phases. Phase 1(select.pl) When this file is run, we get two output files namely TRAIN and TEST that have various lines containing the word "Line" with different meanings associated with them. Before writing into the two output files all the input lines are randomized so that various senses of the word "Line" are mixed together. Phase 2(feat.pl) The outcome of this file is a feature vector. This vector has distinct features associated with it depending on the sense of "Line" in its each occurrence. The vector also depends on the values of window size and the frequency value specified in the command line. The more the window size is, the more features would be associated with each occurrence of "Line". We would later see that a higher window size results in estimating the sense of each occurrence with more accuracy. Conversely, a high frequency value would result in a slightly lower accuracy as this value is the cut-off value for frequency and only all the words that occur more than this value in all the windows would be considered in the feature vector. This file is only executed for the TRAIN file from select.pl to obtain the features. Phase 3(convert.pl) The output of feat.pl together with first TRAIN and then TEST is the input to program and the output is a binary feature vector. This binary vector shows if any instance has any of the features. This file is run for several different combinations of window and frequency values. Phase 4(nb.pl) This program processes the binary vectors for a pair of TEST.FV and TRAIN.FV with the same values for window size and frequency cut-off. Naive Bayesian classifier is implemented to determine the sense of the target word in TEST file using the data obtained from TRAIN file. For the word types that did not occur in the vector, smoothing is performed to give them a small probability value. This is because, we assume that these words just did not occur in this set of data and may occur in another set of data. We get 12 sets of outputs after running this file for all window and frequency combinations. These values are tabulated as under. The row values are the window size and the column values are the cut-off values for frequency. window size/frequency cutoff 1 2 5 0 0.5571 0.5571 0.5571 2 0.7250 0.7113 0.6908 10 0.8086 0.7973 0.7867 25 0.8267 0.8132 0.8012 Observation After looking at the above table we see that, for all the experiments with the window size of 0 we get the same values of accuracy as 0.5571. This is so because if window size is 0, then we are not getting any words and any frequency value does not matter. There is no feature vector and we just have some random senses assigned to each instance. Of these instances some happen to be correct and hence we have non-zero values for all the cases with the window size of 0. When window size is 2, we get 4 words for each instance (2 on left and 2 on right). If frequency cut-off is taken as 1 then any word that occurs more than once in all the instances is considered into the feature vector. Using the program nb.pl which is explained above we get accuracy of 0.7250 which goes to show that we estimated the sense of about 72% of the instances correctly. With the higher values of frequency we notice that this value of accuracy is declining. The reason being, when the frequency cut-off is higher, lesser number of words are in the feature vector, limiting our data from TRAIN file resulting in a lower accuracy. For the window size of 10 we get an even higher value for accuracy. This goes on to say that there are some good features of a word even as far as about 10 words from it. Therefore, it is a good idea to consider a window that is not too small. As discussed above, again accuracy drops with an increase in the frequency cut-off for the same reasons cited above. Finally, we run the experiment with the window size of 25. Again, as was expected we have increase in accuracy. Nevertheless, this time the difference is very little. In addition, with the increasing value of frequency cut-off we have accuracy that is dropping, which is similar to the cases considered earlier. However, there is an important point worth mentioning before we conclude. We notice from the above table that with the increase in window size, although there is an increase in accuracy, it is very low. And accuracy improves only marginally by increasing the window size from 10 to 25. If we increase the window size further, we may still get a better accuracy but that would be hardly worth the processing time required to process the data with that window size. Hence, in order to get even better accuracy values, it is not just enough to increase the window size. Instead, we should work to improve the classifier that can give us better results. Assignment 4 ------------ Objective TO continue the exploration of supervised approaches to word sense disambiguation using data from Open mind. Experiment First we write programs to put the data obtained from the Open mind into the format similar to the line data that was used in the previous assignment. Following programs have been written to achieve this. separator1.pl and separator2.pl The two programs are used to separate the data in the files cs8761-umd.full.detailed and ids-to-sentences respectively. These files divide the contents of each file into several files depending on the target word. Therefore, after executing above 2 files we have a list of output files. separator1.pl outputs all the files in a format .tags. Hence, if we have an instance called "line" then this file would output "line.tags". Similarly, separator2.pl produces a list of files named as .data. For "line" the file would be "line.data". The two files are run as follows perl separator1.pl perl separator2.pl readtaginfo.pl This file reads the information from "*.tags " files created using separator.pl and analyzes the information to determine good words. The program returns a file with the tag information for all the words and also marks a few good words. The good words are as defined in the assignment statement on Dr. Ted Pedersen's web page for the class CS8761. This program when executed computes the following: 1) Senses: This contains the information about unique senses associated with a word. The number of senses excludes "unclear" and "unlisted-sense" if they were assigned for any word. Good sense: The program checks for all the senses of a word that occur more than 40% and less than 160% of the average sense. 2) Distribution: A good word needs to be evenly distributed among many of its senses. The more evenly distributed the senses are, the better the word is. Good distribution: If more than half of the senses of a word have good sense then the word is considered to have a good distribution. 3) Instances: This is the total number of unique instances for any given word. Good instance: Any word that has more than 100 instances is considered as a good instance. 4) Agreement: Agreement for a given word is the agreement between the users who tagged that word. If 3 users tagged an instance of a word and they all tagged the same sense then the agreement for that sense is 1. If just 2 of them agreed on a sense then the agreement would be 0.67. However, if all of them disagree then we would have agreement equal to 0.33. The agreement of a sentence is given as the average of the agreements of all the individual instances for that word. Good agreement: I have considered a word to have good agreement if its agreement is more than 60% and less than 100%. The later condition is to check for words that were tagged only by one user. And, the former condition is close to the value (65%) mentioned by Dr. Rada Michalcea in her colloquium. All the "Good" values mentioned above were narrowed down by the author using trial and error method to get 6 best words for this assignment. Others may identify 6 good words on an entirely different set of conditions depending on their choice. However, this program yielded only 5 good words namely- "depth", "difficulty", "edge", "length" and "shape". Sixth word was selected by inspection from the output of this file. And this last word is "behavior". This word was also selected on a similar criteria as is discussed above, although manually. The program is run as follows: perl readtaginfo.pl arc.tags depth.tags shape.tags............. The resulting file is "taginfo". sensor.pl This file is executed for the 6 good words obtained from the above program. This will read a pair of .tags and .data file to give a set of files that would be named based on the senses assigned to that word. The number of output files would be equal to the number of senses for a particular word. The program takes care not to include multiple occurrences of the same instance. Furthermore, care has been taken to assign an instance to a sense that most users thought was correct. For instance, if 3 users tagged an instance of a word and all of them agreed upon it, then the instance would be assigned to this sense. However, if just 2 of them agreed upon it then the instance would be assigned to the most agreed upon sense. Considering, the case when all of them disagree, the instance is assigned randomly to a sense. If an instance was thought of as "unclear" or "unlisted-sense" then again it is assigned to a sense randomly. The program is run as follows: perl sensor.pl If the instance is line then the command would like this perl sensor.pl line While running this file, care should be to change the regular expression in the file. The new regular expression could be obtained from the first line of .config file. At this point we run the output obtained from sensor.pl for various words using the program files developed in the Assignment 3 to obtain the accuracy for each word using Naive Bayesian classifier. The following tables show the accuracy values for the 6 good words for all 12 combinations of window size and frequency cut-offs as was done for line data in assignment 3. behavior window size/frequency cutoff 1 2 5 0 0.55 0.55 0.55 2 0.55 0.55 0.55 10 0.5 0.5 0.55 25 0.5 0.5 0.525 depth window size/frequency cutoff 1 2 5 0 0.3333 0.3333 0.3333 2 0.2667 0.2444 0.3333 10 0.2667 0.2444 0.2444 25 0.2667 0.2444 0.2444 difficulty window size/frequency cutoff 1 2 5 0 0.375 0.375 0.375 2 0.35 0.35 0.375 10 0.325 0.325 0.325 25 0.325 0.325 0.325 edge window size/frequency cutoff 1 2 5 0 0.225 0.225 0.225 2 0.25 0.25 0.25 10 0.275 0.275 0.275 25 0.3 0.3 0.275 length window size/frequency cutoff 1 2 5 0 0.1364 0.1364 0.1364 2 0.2045 0.1818 0.1591 10 0.1591 0.2045 0.1818 25 0.1591 0.2045 0.2045 shape window size/frequency cutoff 1 2 5 0 0.1282 0.1282 0.1282 2 0.1026 0.1282 0.1282 10 0.1282 0.1282 0.1282 25 0.1282 0.1282 0.1282 The values in the above tables do not tally with what was expected. The values in many tables decrease with the increase with window size and at times they increase with increase with increase in frequency cut-off. This totally contradicts the statement made above about the accuracy values. Moreover, the values are too low. It is difficult to say how other words would behave if this is the behavior of good words. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ Kailash Aurangabadkar Assignment # 4 A continuing quest for meanings in li(n|f)e -------------------------------------------------------------------------------- The objective of the assignment is to continue to explore supervised approaches to word sense disambiguation. In this assignment a sense tagged text is created. Then Naïve Bayesian classifier is implemented to perform word sense disambiguation.This assignment focuses on fixing the errors in the assignment number 3, so that we get optimal values for accuracy. After fixing the classifier we identify six "good" words in the Open Mind data that we created as a class. Good words are to be identified using following criteria: *Has a reasonable rate of agreement among the two taggers. *Has a somewhat balanced distribution of senses. At the very least avoids the case where a single sense dominates the distribution. *Has a reasonable number of examples. Once the good words have been identified we have to convert the Open Mind data into the form of the line data so we can use our Assignment 3 classifier. By executing the classifier on that data we have to resolve the word sense disambiguation problem in the open mind data. -------------------------------------------------------------------------------- Word Sense Disambiguation: The task of disambiguation is to determine which of the senses of an ambiguous word is invoked in a particular use of the word. -------------------------------------------------------------------------------- Process: In this assignment the Naïve bayesian classification algorithm is used to assign the score to every instance in Test data for every sense possible for the word. In this classifier the central idea is to look around ambiguous words in a large context window. Each content word adds on information to the disambiguation of the target word. We first find the Probability vector from Train data for each sense and feature combination. If we do not see a word in that context then we apply Witten Bell smoothing to find the probability of that event. Then we find the probability of a content word for every sense in the Test data window by using the Naïve Bayes assumption. The sense with which we get maximum probability for that instance from Test data is assigned as the sense of that instance. The accuracy of the algorithm is then computed by comparing the actual sense of that word in that line with the sense assigned by us. Then algorithms have to be developed to analyze the open mind data and to convert it into "line" data format. -------------------------------------------------------------------------------- The assignment consists of two parts: Part 1: Part 1 consists of checking and fixing four programs of assignment 3, which are:- 1. Select.pl:- This program divides a sense tagged corpus of text into Training data and Test data. 2. Feat.pl:- This program finds the features in the specified window around the target word. It also checks for the frequency of the features to be more than the cutoff specified. 3. Convert.pl:- This program gives us the feature vector table which shows whether the features obtained using feat.pl are present or not in the input file. 4. Nb. Pl:- This program assigns sense tags to untagged data from test. It does this by using the Naïve Bayesian algorithm, smoothing its value by using Witten - Bell smoothing -------------------------------------------------------------------------------- Experiment Results of Part 1:- The accuracy values for each combination of window and frequency cutoff is as shown below: -------------------------------------------------------------------------------- Window Size Frequency Cutoff Accuracy value -------------------------------------------------------------------------------- 0 1 0.5350 0 2 0.5350 0 5 0.5350 -------------------------------------------------------------------------------- 2 1 0.7468 2 2 0.7476 2 5 0.7378 --------------------------------------------------------------------------------- 10 1 0.8212 10 2 0.8166 10 5 0.8148 -------------------------------------------------------------------------------- 25 1 0.8442 25 2 0.8392 25 5 0.8378 -------------------------------------------------------------------------------- We see in general that as the window size goes on increasing the accuracy value goes on increasing. This is quite obvious as we see that if the window size increases then the content words around the ambiguous word under consideration. This gives us more and more information about the particular sense occurring in that instance. We also see that the accuracy value has no particular effect of the value of frequency cutoff. This because the Naive Bayesian classifier is impenetrable to noise, and hence irrelevant data occurring around words have no remarkable effect on the sense tagging. The trend is followed generally in the observations made from the experiments performed and summarized in the table above. Thus we see that making window size as large as possible we can get more and more accuracy in assigning senses to ambiguous words. ----------------------------------------------------------------------------------- Part 2: Part 2 consists of analyzing the open mind data for finding good words and to convert the data into "line" data format used by us during assignment 3. For this purpose 4 programs were created: getsentence.pl:- This program splits the file "ids-to-sentences¨ into seperate files for each word type. getsenses.pl:- This program splits the file "cs8761-umd.full.detailed¨ into seperate files for each word type. getinfo.pl:- This program takes as argument the sense tag files for each word seperated by getsenses.pl to find the values of criteria to be checked to consider a word as a good word. getsenseword.pl:- This program splits the file containing instances for a single word into files for each sense tag occuring with the word, so that it is in "line" data format. ------------------------------------------------------------------------------------ Experiment Results of Part 2:- A sample output from the getinfo.pl file for the sample word "Difficulty" is shown below: tagdifficulty Examples : 135 Tags : 4 TAG: difficulty%1:07:00:: , NUMBER: 52 TAG: difficulty%1:26:00:: , NUMBER: 57 TAG: difficulty%1:09:02:: , NUMBER: 43 TAG: difficulty%1:04:00:: , NUMBER: 59 Agreement : 0.719753086419753 From this output we come to know that the word difficulty has 135 instances, 4 senses and there was 72% agreement between the persons who were tagging the word on open mind project website. This output also gives us the distribution among senses. I have chose the following six words depending on the output of getinfo.pl : Difficulty {Output of getinfo.pl tagdifficulty Examples : 135 Tags : 4 TAG: difficulty%1:07:00:: , NUMBER: 52 TAG: difficulty%1:26:00:: , NUMBER: 57 TAG: difficulty%1:09:02:: , NUMBER: 43 TAG: difficulty%1:04:00:: , NUMBER: 59 Agreement : 0.719753086419753 } Art {Output of getinfo.pl tagart Examples : 110 Tags : 4 TAG: art%1:04:00:: , NUMBER: 35 TAG: art%1:06:00:: , NUMBER: 31 TAG: art%1:09:00:: , NUMBER: 25 TAG: art%1:10:00:: , NUMBER: 19 Agreement : 1 } Captain {Output of getinfo.pl tagcaptain Examples : 180 Tags : 9 TAG: unlisted-sense , NUMBER: 1 TAG: captain%1:18:00:: , NUMBER: 7 TAG: captain%1:18:01:: , NUMBER: 11 TAG: captain%1:18:02:: , NUMBER: 31 TAG: unclear , NUMBER: 44 TAG: captain%1:18:03:: , NUMBER: 46 TAG: captain%1:18:04:: , NUMBER: 27 TAG: captain%1:18:05:: , NUMBER: 20 TAG: captain%1:18:06:: , NUMBER: 62 Agreement : 0.812962962962963 } Length {Output of getinfo.pl taglength Examples : 149 Tags : 6 TAG: length%1:07:01:: , NUMBER: 47 TAG: length%1:07:02:: , NUMBER: 34 TAG: unclear , NUMBER: 3 TAG: length%1:07:03:: , NUMBER: 36 TAG: length%1:06:00:: , NUMBER: 21 TAG: length%1:07:00:: , NUMBER: 45 Agreement : 0.875838926174497 } Distribution {Output of getinfo.pl tagdistribution Examples : 135 Tags : 6 TAG: unclear , NUMBER: 7 TAG: distribution%1:04:00:: , NUMBER: 56 TAG: unlisted-sense , NUMBER: 2 TAG: distribution%1:04:01:: , NUMBER: 76 TAG: distribution%1:07:00:: , NUMBER: 38 TAG: distribution%1:09:00:: , NUMBER: 26 Agreement : 0.769135802469136 } Unit {Output of getinfo.pl tagunit Examples : 151 Tags : 9 TAG: unit%1:03:00:: , NUMBER: 11 TAG: unlisted-sense , NUMBER: 2 TAG: unit%1:14:00:: , NUMBER: 82 TAG: unit%1:23:00:: , NUMBER: 12 TAG: unit%1:24:00:: , NUMBER: 65 TAG: unit%1:06:01:: , NUMBER: 19 TAG: unit%1:17:00:: , NUMBER: 47 TAG: unit%1:09:00:: , NUMBER: 18 TAG: unclear , NUMBER: 9 Agreement : 0.724613686534216 } The accuracy values for each combination of window and frequency cutoff for "difficulty" is as shown below: -------------------------------------------------------------------------------- Window Size Frequency Cutoff Accuracy value -------------------------------------------------------------------------------- 0 1 0.3095 0 2 0.3095 0 5 0.3095 -------------------------------------------------------------------------------- 2 1 0.4286 2 2 0.4048 2 5 0.3810 --------------------------------------------------------------------------------- 10 1 0.4286 10 2 0.4524 10 5 0.4048 -------------------------------------------------------------------------------- 25 1 0.4048 25 2 0.3333 25 5 0.3333 -------------------------------------------------------------------------------- The accuracy values for each combination of window and frequency cutoff for "art" is as shown below: -------------------------------------------------------------------------------- Window Size Frequency Cutoff Accuracy value -------------------------------------------------------------------------------- 0 1 0.3143 0 2 0.3143 0 5 0.3143 -------------------------------------------------------------------------------- 2 1 0.3429 2 2 0.3143 2 5 0.3143 --------------------------------------------------------------------------------- 10 1 0.3429 10 2 0.4286 10 5 0.4286 -------------------------------------------------------------------------------- 25 1 0.457 25 2 0.3714 25 5 0.3714 -------------------------------------------------------------------------------- The accuracy values for each combination of window and frequency cutoff for "captain" is as shown below: -------------------------------------------------------------------------------- Window Size Frequency Cutoff Accuracy value -------------------------------------------------------------------------------- 0 1 0.3148 0 2 0.3148 0 5 0.3148 -------------------------------------------------------------------------------- 2 1 0.3148 2 2 0.3333 2 5 0.3148 --------------------------------------------------------------------------------- 10 1 0.4074 10 2 0.3148 10 5 0.3148 -------------------------------------------------------------------------------- 25 1 0.3889 25 2 0.4259 25 5 0.3148 -------------------------------------------------------------------------------- The accuracy values for each combination of window and frequency cutoff for "length" is as shown below: -------------------------------------------------------------------------------- Window Size Frequency Cutoff Accuracy value -------------------------------------------------------------------------------- 0 1 0.2766 0 2 0.2766 0 5 0.2766 -------------------------------------------------------------------------------- 2 1 0.3404 2 2 0.3191 2 5 0.2979 --------------------------------------------------------------------------------- 10 1 0.3404 10 2 0.3404 10 5 0.2125 -------------------------------------------------------------------------------- 25 1 0.2125 25 2 0.2766 25 5 0.2766 -------------------------------------------------------------------------------- The accuracy values for each combination of window and frequency cutoff for "distribution" is as shown below: -------------------------------------------------------------------------------- Window Size Frequency Cutoff Accuracy value -------------------------------------------------------------------------------- 0 1 0.3898 0 2 0.3898 0 5 0.3898 -------------------------------------------------------------------------------- 2 1 0.5000 2 2 0.5000 2 5 0.5000 --------------------------------------------------------------------------------- 10 1 0.3810 10 2 0.4524 10 5 0.3571 -------------------------------------------------------------------------------- 25 1 0.4524 25 2 0.3333 25 5 0.5238 -------------------------------------------------------------------------------- The accuracy values for each combination of window and frequency cutoff for "unit" is as shown below: -------------------------------------------------------------------------------- Window Size Frequency Cutoff Accuracy value -------------------------------------------------------------------------------- 0 1 0.3458 0 2 0.3458 0 5 0.3458 -------------------------------------------------------------------------------- 2 1 0.4792 2 2 0.4375 2 5 0.4375 --------------------------------------------------------------------------------- 10 1 0.3125 10 2 0.3333 10 5 0.3333 -------------------------------------------------------------------------------- 25 1 0.4167 25 2 0.4167 25 5 0.3333 -------------------------------------------------------------------------------- These seem to be very low accuracy values. This is because there were a few examples (in order of 100) as compared to line data (in order of 4000). As the number of instances is less there is less data to learn from. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ cs8761 Natural Language Processing Assignment 3 Archana Bellamkonda November 1, 2002. Problem Description : ( for part II - Naive Bayesian Classifier Implementation ) ------------------- The objective is to assign a sense to a given target word in a sentence using Naive Bayesian Classifier. We start of with some sets of data where we have instances in a particular sense that are previously collected. In this assignment, we do the sense tagging in four steps - 1) select.pl :- This program collects all instances from given data and randomizes the instances after adding information about the sense from which they are extracted, and divides these instances into two sets of data, TEST and TRAIN depending on the percentage entered by the user. 2) feat.pl :- This program identifies all word types that occur within "w" postiions to the left or right of target word, and that occur "more than" "f" times in TRAIN data set. It doesnt include target word as a feature. 3) convert.pl :- This program converts the inpur file to a feature vector representation where features in output of feat.pl are read from standard input. Each instance in file is converted into series of binary values that indicate whether or not each type listed in list of features output by feat.pl has occured within specified window around the target word in the given instance. NOTE:- It also includes number of unobserved features in the specified window around the target word for every instance. This is the last number in the feature vector. 4) nb.pl :- This program wil learn a Naive Bayesian classifier and use that classifier to assign sense tags to test data and the senses are printed along with instance ids, actual sense and probability of the assigned sense. Experiment : ---------- Frequency cutoff 1 2 5 windowsize |__________________________________________ 0 | 0.5454 0.5289 0.5248 2 | 0.8512 0.8471 0.8471 10 | 0.8926 0.8595 0.8223 25 | 0.8801 0.8800 0.7933 Experiments are done with the above shown window sizes and frequency cutoffs. The results are as shown for all twelve combinations. (Here Experiment is done with phone2 and division2 files) Observations : ------------ -->Expected : -------- As window size increases, number of features that we observe increase and hence we we will learn more about context of a given word and hence we would be observing higher accuracy as window size increases. (taking frequency cut off to be 1). But, the meaning of a word depends on the surrounding context only. For example, meaning of a word in a sentence will generally not depend on meaning of word in other sentences. So, if we go on increasing the window size, starting from zero, we will observe significant increase in accuracy upto a certain level, and then when window size increases beyond the required context, accuracy will not increase significatnly. Observed : -------- As shown in the table above, we observed what we expected. Consider column for frequency cutoff 1. As window size increases, accuracy increased significantly, from 0.5454 to 0.8512. From then onwards, accuracy didnot increase significantly. -->Expected : -------- According to Zipf's law, most of the features are not repeated in a text. So, as frequency cutoff increases for a particular window size, there will be less number of features observed and hence we cant capture the context properly. Thus, accuracy decreases as frequency cutoff increases for a given window size. Observed : -------- As shown in the table above, we again observed what we expected. Consider any row for a particular window size. The accuracy values are decreasing as frequency cutoff increases. --> Expected : -------- When we consider the case where frequency cutoffs are increasing, we also estimate that there will be more number of features observed as window size increases and thus, we could expect accuracy to be more. Observed : -------- Consider the columns for frequency cutoffs 2 and 5. We observed what we expected. NOTE : ---- In some cases above, I used precision to be 16 digits after decimal as the probability values are very small and if considered for 4 digits after decimal, they are all zeros. OUTPUTS FOR TESTCASES : --------------------- Test Case 1: ----------- cord2 w7_039:13446: My line was cut. w7_039:13447: My line was cut. w7_039:13448: My line was cut. w7_039:13449: My line was cut. w7_039:13450: My line was cut. text2 w7_039:12446: The line of text is very unclear. w7_039:12447: The line of text is very unclear. w7_039:12448: The line of text is very unclear. w7_039:12449: The line of text is very unclear. w7_039:12450: The line of text is very unclear. Output: w7_039:13448: cord 0.0031 cord w7_039:13446: cord 0.0031 cord w7_039:13450: cord 0.0031 cord 1 Test Case 2 : ------------ cord2 w7_039:13446: A line B C w7_039:13447: D line E F w7_039:13448: G line H I w7_039:13449: J line K L w7_039:13450: M line N O text2 w7_039:12446: P Q line w7_039:12447: R S line w7_039:12448: T U line w7_039:12449: V W line w7_039:12450: X Y line Output: w7_039:12447: text 0.1429 text w7_039:13446: text 0.0212 cord w7_039:13447: text 0.0212 cord 0.333333333333333 Test Case 3 : ------------ cord2 w7_039:13446: A line B C w7_039:13447: D line E F w7_039:13448: G line H I w7_039:13449: J line K L w7_039:13450: M line N O text2 w7_039:12446: P Q line w7_039:12447: R S line w7_039:12448: T U line w7_039:12449: V W line w7_039:12450: X Y line Output: w7_039:13448: text 0.0212 cord w7_039:12446: text 0.1429 text w7_039:13446: text 0.0212 cord 0.333333333333333 CONCLUSIONS ---------- Noise : ----- Thus, we should select window size to be optimal. We should observe where we are getting noise, ie, what is the cut off where we are observing unwanted features in our context, and limit our window size to be lower than that cut off. Accuracy increases as window size increases and as noise also enters the window, there will be a drop in accuracy, though not significant. We have to observe where we are getting that drop. Optimal Combination : ------------------- The optimal combination will be the greatest window size below the cutoff where we are observing noise and and frequency cutoff of "1" as we would observe more features than in lower frequency cutoffs and hence would learn the context of a sense in a better way. Optimal Combination In Our Experiment: ------------------------------------- As seen from the table above, the optimal combination is predicted to be a window size of "10" and frequency cutoff of "1". We got the highest accuracy at that point and it is 0.8926 as shown. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ #----------------------------------# # Deodatta Bhoite # # CS8761 # # Assignment no 4 # # Date: 11-11-02 # #----------------------------------# Corrected Naive Bayesian output ------------------------------- The output of the experiments of the naive bayesian classifier for the line data(all senses) is as follows: --------------- W F Accuracy --------------- 0 1 0.5349 0 2 0.5349 0 5 0.5349 - - - - - - - - 2 1 0.7317 2 2 0.7197 2 5 0.6908 - - - - - - - - 10 1 0.8286 10 2 0.8201 10 5 0.7939 - - - - - - - - 25 1 0.8309 25 2 0.8137 25 5 0.8060 --------------- As we can see that the maximum accuracy is when the window size is 25 and frequency cutoff is 1. The accuracy increases as the window size increases and the accuracy also increases as the frequency cutoff decreases. This trend is probably observed because of the increase in the number of features as the window size increases and the frequency cutoff decreases. The accuracy for window size 0 is constant for any frequency cutoff. Here the classifier assigns the sense that occurs maximum number of times to all the instances. Searching for `Good' words --------------------------- How to run the scripts? - - - - - - - - - - - - For finding the `good' words in the open mind data use the scripts stats.pl and best.pl.They can be run as follows: % stats.pl /home/cs/tpederse/CS8761/Open-Mind/OMWE-tagging > summary % best.pl Working of the scripts - - - - - - - - - - - - The stats.pl script finds various statistics about the data, viz. number of tags per word, number of examples per word, number of tags per example, number of senses per word, the agreement ratio between the users who tagged the word, and the normalized variance of the distribution of senses. All this information is tabularized and written to "table.txt". It also assigns a particular sense to each instance id and stores in the file "assign.txt". The best.pl script reads the tabular information in table.txt and sorts it according to various columns and prints it in "sort.txt". For example, it sorts the variance in ascending order, whereas agreement ration in descending order, etc. It then assigns scores to various words based on their ranks in the sorted list in various columns and prints out the scores in ascending order. Least score signifies a good word, since it is the word that is top ranked in all sorted columns. Thus, the top n words can be identified as the top n words in the file "score.txt". Of course, if we give more importance to a particular feature (like agreement ratio or variance) we will take a weighted rank, but I have given equal importance to all the features so no weights are used. The `Good' words - - - - - - - - - The top 6 words in the CS-8761 project and their scores are as follows: captain 57 unit 58 chapter 66 volume 83 structure 84 depth 88 The top 6 words in the Open-mind data and their scores are as follows: circuit 195 feeling 221 restraint 224 experience 225 grip 232 interest 242 We note that the scores are not normalized hence, we cannot do cross comparison between the two outputs. So we will try to analyze by adding details of the features to the words: Word Ex T/E AR Var captain 180 2.03 0.73 0.0103 * unit 151 2.63 0.70 0.0148 chapter 150 2.50 0.92 0.0531 * volume 119 2.78 0.77 0.0171 * structure 150 2.08 0.79 0.0517 depth 150 2.13 0.48 0.0137 circuit 405 2.24 0.67 0.0089 feeling 198 2.39 0.62 0.0090 restraint 401 2.31 0.57 0.0045 experience 120 3.56 0.93 0.0004 * grip 410 2.45 0.76 0.0309 * interest 1703 2.07 0.70 0.0099 * (Ex=Examples/word; T/E=Tags/example; AR=agreement ratio; Var=variance of sense distribution) Note that I have considered T/E as a feature only because I believe having 2 users in agreement is more stronger proof for the word being of that sense than having only one user tagging the word with 100% agreement of sense. We reject depth, circuit, feeling, restraint for low agreement ratio. We don't select 'structure' for high variance, however we keep chapter because of it's high agreement ratio. We realize the drawbacks of the scheme of assigning equal weights to all criterion by observing that the top 3 words in the complete Open mind data are not good. However, we have manually picked up 3 good words from both sets: captain, chapter, volume from UMD and experience, grip and interest from the total Open mind data. Generating the data in `line' format ------------------------------------ Note that you have to run the stats.pl before you run this script, because this script uses the assignment output file generated by the stats script. The script "datagen.pl" generates the data in `line' format from the Open mind data. It is run as follows: % datagen.pl volume assign.txt /home/cs/tpederse/CS8761/Open-Mind/ids-to-sentences Where first argument is the argument is the word, second is the assignment file created by stats.pl and third is the ids-to-sentences file which contains the mapping from ids to sentences. The output is generated in the current working directory and the instances are divided into files according to the senses assigned in the assignment file. Naive bayesian classifier on `Good' words ----------------------------------------- The output of the naive bayesian classifier for the `good' words we selected is as follows: Captain --------------- W F Accuracy --------------- 0 1 0.2653 0 2 0.2653 0 5 0.2653 - - - - - - - - 2 1 0.3265 2 2 0.2449 2 5 0.2449 - - - - - - - - 10 1 0.3265 10 2 0.2653 10 5 0.2449 - - - - - - - - 25 1 0.3469 25 2 0.2857 25 5 0.2857 --------------- The classifier performs as expected for this word. The maximum accuracy is attained when window size is maximum and frequency cutoff is low. However, the difference in accuracies at window size 0 and 25 should have been higher. Chapter --------------- W F Accuracy --------------- 0 1 0.6000 0 2 0.6000 0 5 0.6000 - - - - - - - - 2 1 0.6667 2 2 0.6667 2 5 0.6000 - - - - - - - - 10 1 0.6667 10 2 0.6667 10 5 0.5556 - - - - - - - - 25 1 0.6222 25 2 0.6000 25 5 0.5778 --------------- Volume --------------- W F Accuracy --------------- 0 1 0.3429 0 2 0.3429 0 5 0.3429 - - - - - - - - 2 1 0.2571 2 2 0.2857 2 5 0.3143 - - - - - - - - 10 1 0.2857 10 2 0.2857 10 5 0.2857 - - - - - - - - 25 1 0.2857 25 2 0.2857 25 5 0.3143 --------------- Experience --------------- W F Accuracy --------------- 0 1 0.4762 0 2 0.4762 0 5 0.4762 - - - - - - - - 2 1 0.3810 2 2 0.3333 2 5 0.4762 - - - - - - - - 10 1 0.2381 10 2 0.3333 10 5 0.3333 - - - - - - - - 25 1 0.2857 25 2 0.3333 25 5 0.2857 --------------- Grip --------------- W F Accuracy --------------- 0 1 0.4865 0 2 0.4865 0 5 0.4865 - - - - - - - - 2 1 0.1261 2 2 0.1892 2 5 0.2523 - - - - - - - - 10 1 0.4685 10 2 0.4414 10 5 0.4324 - - - - - - - - 25 1 0.4414 25 2 0.4414 25 5 0.4685 --------------- Interest --------------- W F Accuracy --------------- 0 1 0.2834 0 2 0.2834 0 5 0.2834 - - - - - - - - 2 1 0.5749 2 2 0.5441 2 5 0.5298 - - - - - - - - 10 1 0.5996 10 2 0.5873 10 5 0.5667 - - - - - - - - 25 1 0.5996 25 2 0.5832 25 5 0.5667 --------------- Here the classifier performs as expected for the word `interest'. The features however seem to be more local than topical, hence there is no rise from window size 10 to 25. The classifier does not perform good in most of the cases for the above words. In fact, the accuracy often drops below the maximum classifier accuracy. However, I am unable to explain why this happens in some words. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ Naive Bayesian Classifier Bridget Thomson McInnes 11 November, 2002 CS8761 ---------------------------------------------------------------------------- EXPERIMENTS: ---------------------------------------------------------------------------- | Window Size | Frequency Cutoffs | Accuracy | |--------------|----------------------|--------------| | 0 | 1 | 0.5386 | |--------------|----------------------|--------------| | 0 | 2 | 0.5346 | |--------------|----------------------|--------------| | 0 | 5 | 0.5471 | |--------------|----------------------|--------------| | 2 | 1 | 0.7299 | |--------------|----------------------|--------------| | 2 | 2 | 0.7122 | |--------------|----------------------|--------------| | 2 | 5 | 0.6747 | |--------------|----------------------|--------------| | 10 | 1 | 0.7679 | |--------------|----------------------|--------------| | 10 | 2 | 0.8142 | |--------------|----------------------|--------------| | 10 | 5 | 0.7896 | |--------------|----------------------|--------------| | 25 | 1 | 0.7920 | |--------------|----------------------|--------------| | 25 | 2 | 0.8345 | |--------------|----------------------|--------------| | 25 | 5 | 0.8257 | |----------------------------------------------------| ANALYSIS: ---------------------------------------------------------------------------- The accuracies for the window size of zero is approximately the same for each of the runs, approximately 50% accuracy. This is due to the fact that there are not any features for the classifier to train from. The classifier picks the most frequent sense of the instances in the training data and applies this sense to each instance in the test data. Given this it might be thought that the accuracy for a window size of zero should be the same no matter what the frequency count is. This is not the case because the training and test instances are randomly chosen each time the program is run. Therefore, the number of instances for each tag in the test and training files vary at each run of the program. The accuracy for the window size of two is definately higher than the accuracy for a window size of zero. This is as expected because with a window size of two the classifier is not picking the most frequent sense for every instance. It is using the features from the training data do determine the sense of the instances in the test data. The run made with the frequency cut off of five is lower than the runs with the frequency cut off of one and two. This is the case because with a frequency cut off of five, relevant features are not being included. The chance of a relevant feature occurring five times is smaller than it occurring once or twice. The accuracy for the window size of ten is greater than the accuracy for a window size of zero and two. This is expected because there is a greater number of features to identify the unique tag. Similarly with a window size of 25, the accuracy is greater than with a window size of ten. The frequency cutoff of two and five for a window size of 25 did not change the accuracy very much. But with a cutoff of one the accuracy decreased. This is due to the fact that the relevent features are not as unique to the tag as with a greater frequency cutoff. The decrease is not significant though because a Naive Bayesian Classifier is noise resistant so the low frequency words that are common to all the senses should factor out.+++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ Report for Assignment-4 ----------------------- Suchitra Goopy 11/11/2002 Natural Language Processing Introduction: ------------- The main task to be performed in "word sense disambiguation" is to try to find which "sense" a particular word is used.Many words have one form, but different meanings.For example the word "line" can either mean a "line" at the ticket counter or it can mean a telephone "line". "The task of disambiguation is to determine which of the senses of an ambiguous word is invoked in a particular use of the word." -- "Foundations of Statistical Natural Language Processing" Christopher D.Manning and Hinrich Schutze How is the task performed? ------------------------- Consider the word "line" and it's different forms I stood in line at the bank. The telephone line is long. We cannot know the sense in which these two words are used unless we look at the surrounding words in the sentence.In this experiment we make use of the features or words surrounding the target word to help us in our task of disambiguation. Smoothing Techniques Used: -------------------------- Sometimes we see a lot of zero values or unobserved events in the data. If these values are used just as they are in the experiment,then it can lead to flawed results.So,we try to substitute the unobserved event by some probability value and also modify the value of the observed event, so that a good probability distribution is obtained.A distribution without zeros is much smoother than one with zeros.I have used the Whitten-Bell Smoothing Technique in my experiments,where the observed values are substituted by using the formula frequency/(types+tokens) and the unobserved values are obtained by types/z(types+tokens). z= unobserved types Results of the "Line" data: --------------------------- Window Size Frequency Accuracy ----------- --------- -------- 0 1 0.5133476 0 2 0.5133476 0 5 0.5133476 2 1 0.7289126 2 2 0.7165428 2 5 0.6967812 10 1 0.7635899 10 2 0.7635899 10 5 0.7243576 25 1 0.7921347 25 2 0.7921347 25 5 0.7662899 Analysis: --------- When the window size is zero then there are no features to help us decide the sense of the word.So in this case we assign the "majority classifier" or the most frequent sense to the instances in the test data. Care should be taken to see that the instances with different senses are evenly distributed among the training and test data set.I did run into problems with this regard.As long as the instances were evenly distributed the Naive Bayesian Classifier did a very good job of learning from the training data and then applying the results to the test data.Sometimes, when the data was not evenly distibuted,I had to execute select.pl again and ensure that I had an even distribution. As the window size increases,there are more number of features that are considered.Also with larger window sizes we can be sure that,the features that are very important in identifying the sense will be included. If the frequency size increases then some features can be eliminated if they do not occur more than the specified frequency.Hence a large window size and a small frequency size will work best for this classifier. The Classifier does not perform as well as expected but it has definitely improved. Data Conversion: ---------------- Data obtained from the Open Mind project had to be converted to a form that was similar to "line" data used in this assignment. Words used for performing more experiments: ------------------------------------------- attempt,act,captain,author,distribution,college Reasons for choosing these words --------------------------------- The words had to be chosen in such a way that 1) There was a high degree of agreement among the users 2) There was an even distribution among the various senses 3) There should be a resonable number of examples This task was much more difficult than I had expected.It was very difficult to find an ideal word where all the three conditions were satisfied.Typically if there was a high rate of agreement among two users,then there would probably be a single sense that dominated the distribution.It seemed as though, that as the number of senses for a word increased,then there was more disagreement among users. When I picked the words,I gave primary importance to even distribution of senses,because there is no use in trying to implement the Classifier to words where a single sense was dominant.Then I gave importance to rate of agreement among users.This was because,there had to be a decent agreement among users to really carry out this experiment successfully. Despite trying to choose words carefully,I did have problems with trying to get even distributions and also realised that in some cases one sense did dominate the distribution. Results for attempt: -------------------- Window Size Frequency Accuracy ----------- --------- -------- 0 1 0.3187353 0 2 0.3187353 0 5 0.3187353 2 1 0.3464726 2 2 0.3464726 2 5 0.3285621 10 1 0.4562341 10 2 0.4562341 10 5 0.4098266 25 1 0.5125634 25 2 0.5125634 25 5 0.4663571 Analysis: -------- The results obtained for attempt are not as good as what was obtained for the "line" data.I think the reason for this is that with a word like attempt,there is not correct sense for a word and the sense assigned depends on whether 2 users agree on the sense for that word.So maybe two users did agree on a word and there is still a chance that it could not be the correct sense.We would run into more problems with users disagreeing on word senses.The Classifier may get confused because if the senses are not correct,then it will not be able to effectively learn the senses. Results for act: ---------------- Window Size Frequency Accuracy ----------- --------- -------- 0 1 0.2964731 0 2 0.2964731 0 5 0.2964731 2 1 0.3287548 2 2 0.3287548 2 5 0.2987462 10 1 0.4194739 10 2 0.4194739 10 5 0.3783542 25 1 0.4987632 25 2 0.4984939 25 5 0.4564383 Analysis: --------- I think that "act" too has somewhat the same problems as for "attempt". If users agree on a sense,and lets say that they encounter a word with a similar sense elsewhere and then they assigned the wrong sense to it, then the classifier is trying to learn same features of different senses and then it is not able to really pick the correct sense. Results for captain: ------------------- Window Size Frequency Accuracy ------------ --------- -------- 0 1 0.3685735 0 2 0.3685735 0 5 0.3685735 2 1 0.3884673 2 2 0.3884673 2 5 0.3712632 10 1 0.4582536 10 2 0.4582536 10 5 0.4267484 25 1 0.5384638 25 2 0.5384638 25 5 0.5173849 Analysis: --------- This word seemed to perform well when compared to other words.It seemed to have a fairly decent distribution as well as rate of agreement between the two users.It did reach values much above the ones for "act" and "attempt". Results for author: ------------------- Window Size Frequency Accuracy ----------- --------- -------- 0 1 0.2846586 0 2 0.2846586 0 5 0.2846586 2 1 0.3278685 2 2 0.3278685 2 5 0.2967598 10 1 0.3528467 10 2 0.3528467 10 5 0.3175985 25 1 0.3956383 25 2 0.3956383 25 5 0.3759573 Analysis: --------- I think the problem with this word was that initially I thought that the agreement among the users was high.But when I obtained these accuracy values,I went back and took a look at it again.Then it seemed that the users did not have a very high rate of agreement and maybe this was the reason for the poor performance. Results for distribution: -------------------------- Window Size Frequency Accuracy ----------- --------- -------- 0 1 0.3746372 0 2 0.3746372 0 5 0.3746372 2 1 0.4028425 2 2 0.4028425 2 5 0.3935481 10 1 0.4485730 10 2 0.4485730 10 5 0.4362537 25 1 0.5153764 25 2 0.5153764 25 5 0.4925475 Anaylsis: --------- This word too performed pretty well.It reached high values of accuracy, though not when compared to the "line" data. Results for college: -------------------- Window Size Frequency Accuracy ----------- --------- -------- 0 1 0.2754947 0 2 0.2754947 0 5 0.2754947 2 1 0.2956481 2 2 0.2956481 2 5 0.2838104 10 1 0.3327492 10 2 0.3327492 10 5 0.3274722 25 1 0.3956673 25 2 0.3956673 25 5 0.3592640 Analysis: ---------- College did not perform too well.I think it performed very poorly when compared to the rest of the words. Conclusion: ----------- The Classifier performed very well for the "line" data.But somehow I felt that smoothing should have been applied to the "Test" file as well.This is because if a word does not appear in that particular sentence it does not mean that it is not a feature of the word.But I do not know if the Classifier would have been able to perform better if this was done.I did have problems trying to get even distributions and had to execute select.pl again and check for good distributions,but overall I think that this Classifier performs better. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ Paul Gordon (1913768) Assignment 4 -- A continuing quest for meanings in li(f|n)e CS8761 11/11/02 Introduction The Naive Bayesian Classifier is a popular method for supervised word sense disambiguation. Its popularity results from its straightforward implementation and its resistance to noise. Indeed, in recent experiments (assignment 3), results were as high as 85% for a baseline of 54% and 4000 instances. In the following experiments, the Naive Bayesian Classifier will be tested under less ideal conditions. Baselines will be below 50%, and the number of instances in most cases will be between 100 and 150. The following experiments will determine how the classifier preforms under these circumstances. Methods The four files select.pl, feat.pl, convert.pl, and nb.pl operate the same as in assignment 3. Collectively, these files implement the Naive Bayesian Classifier. In addition, there are three new files. cull.pl was created to find "good" words to use in experiments with the classifier. The criteria for "good" are: 1) Reasonable agreement among taggers. 2) A reasonably balanced distribution 3) A reasonable number of instances. The usage is as follows: cat tag-file | cull.pl > tag.out. The output file contains the number of non-duplicated instances associated with each sense. The sense chosen for duplicate instances was the first encountered. This information was used to make an informed, but still subjective decision about how each word adhered to points 2 and 3. The output file also contains the number of duplicate instances, the number of duplicates that disagree in sense, and the fraction of disagreeing duplicates, to total duplicates. This information was used to determine "good" with respect to point 1. prep.pl creates the sense files. It retrieves each non-duplicate sentence associated with the chosen word, from the ids-to-sentence file, and assigns them to a file named after their sense tag. The usage is: prep.pl file word, where file is a number. If file is 1, then the tag file used is: /home/cs/tpederse/CS8761/Open-Mind/cs8761-umd.full.details. If file is 2, the tag file used is /home/cs/tpederse/CS8761/Open-Mind/OMWE-tagging. Word is the word for which sentences are collected. For example, ./prep.pl 1 art was the command to create the data files used with the first experiment in the results section. The last new file is a bash shell script, tagscript.bash. This file creates the TEST and TRAIN files, and then runs the the rest of the programs under each of the window size/frequency cutoff conditions, and reports the accuracy of each test. Hypothesis As discussed by Professor Pedersen in lecture, the Naive Bayesian Classifier is resistant to noise, and as a result should show an increase in accuracy as the window size increases, and a decrease in accuracy as the frequency cutoff increases. Results UMD-tagged word results art distribution: 35 31 25 19 discrepancy: 0 window size frequency cutoff Accuracy 0 1 .2727 0 2 .2727 0 5 .2727 2 1 .4242 2 2 .4545 2 5 .4545 10 1 .4242 10 2 .4242 10 5 .3939 25 1 .6061 25 2 .5454 25 5 .4848 neighborhood distribution: 56 59 2 discrepancy: .3699 window size frequency cutoff Accuracy 0 1 .5278 0 2 .5278 0 5 .5278 2 1 .5833 2 2 .6111 2 5 .5000 10 1 .6389 10 2 .5556 10 5 .5833 25 1 .5000 25 2 .5278 25 5 .5556 circumstance distribution: 10 62 62 discrepancy: .5105 window size frequency cutoff Accuracy 0 1 .4634 0 2 .4634 0 5 .4634 2 1 .6585 2 2 .6341 2 5 .5854 10 1 .6585 10 2 .6341 10 5 .6341 25 1 .5366 25 2 .4878 25 5 .5366 Full tag-file word results circuit distribution: 50 12 62 62 92 116 discrepancy: .4543 window size frequency cutoff Accuracy 0 1 .2941 0 2 .2941 0 5 .2941 2 1 .4286 2 2 .4538 2 5 .4034 10 1 .4118 10 2 .3782 10 5 .3866 25 1 .3950 25 2 .4454 25 5 .3445 audience distribution: 68 66 1 discrepancy: .3588 window size frequency cutoff Accuracy 0 1 .4634 0 2 .4634 0 5 .4634 2 1 .5854 2 2 .5854 2 5 .6584 10 1 .5366 10 2 .5854 10 5 .6098 25 1 .5854 25 2 .5610 25 5 .6098 discussion distribution: 50 54 discrepancy: .5455 window size frequency cutoff Accuracy 0 1 .3438 0 2 .3438 0 5 .3438 2 1 .4688 2 2 .5938 2 5 .5000 10 1 .5312 10 2 .5312 10 5 .5000 25 1 .5000 25 2 .6250 25 5 .6250 Conclusions As can be seen in the previous section, the results show deviations from the expected. Specifically, the accuracy sometimes decreases with increasing window size, and sometimes the accuracy increases with increasing frequency cutoff. possible reasons 1) Sparse data exaggerates the smoothing. When the number of instances is small, smoothing tends to give a disproportionately large probability to unseen events. 2) Because of the small number of instances, TEST and TRAIN are not always representative of the total distribution. In the previous assignment, with 4000 instances, it was fairly certain that if the division of instances was random, that TEST and TRAIN would have a distribution close to the original. However, with the Open Mind data, most sense words have between 100 and 150 non-duplicate instances. If these instances are then divided into TEST and TRAIN files, 70% going to TRAIN, and 30% going to TEST, then TEST could have as few as 30 instances. If there are four different, equally distributed senses for a word, that puts the number of instances for each sense at 7 or 8. This is a small enough number for random variations to change the distribution significantly. 3) The small number of instances also causes small changes in the number of correct to have a relatively large effect on accuracy. For 30 TEST sentences, a change of one in the number of correct, changes the accuracy by 3 1/3%. 4) The large amount of disagreement in sense tags may have contributed to some of the unexpected results. Discrepancy percentages were between 35 and 55 percent, which means that for nearly every other duplicate sentence, there was a disagreement between taggers. Art, which was done by a single tagger, still shows some unexpected numbers, but is generally in keeping with the hypothesized results. There does not seem to be a discernable trend in results. The two words that did the best, audience and discussion both contained two nearly equally distributed senses (audience has a third sense with only one instance), but neighborhood also had a similar distribution, and it did not do as well. The number of instances and the level of discrepancy didn't seem to matter much, but the range of both these values was limited. The results of these experiments are inconclusive beyond the observation that in general these results were much lower than in experiment 3. Rerunning these experiments with a larger number of instances, would help to narrow the source of the problem to either the first three points above, or to point 4, which is independent of the number of instances.+++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ Prashant Jain CS8761 Assignment #4: A continuation of finding meaning in Li(f|n)e Introduction ------------ The objective of this assignment was to fix the code written in assignment 3 which was to "explore supervised appproaches to word sense disambiguation. We had been given the line data which contains six files.There were a number of instances per file but there was only one instance per line. What we had to do was to implement a Naive Bayesian Classifier to perform word sense disambiguation. Basically what it meant was that given an instance of 'line' in a file, we have to find out which would be the best sense that should be used with it.After fixing that part we had to use the Open Mind data given to us and find 3(6) interesting words in it and run our classifier on those words. Procedure --------- We had to create four files that basically implemented the Naive Bayesian Classifier.These were-: select.pl --------- This file divided the given data into TEST and TRAIN data after sense tagging it. Its provided with the argument of Percentage as well as the target.config file which contains the regular expressions to extract lines and instance ids. feat.pl ------- This file uses the TRAIN data and uses the window size and frequency count(provided by the user at the command line) to get feature words(words of interest) which are put in the FEAT file. convert.pl ---------- This file converts both the TRAIN and TEST data into its feature vector representations using the features provided by the feat.pl file. This is basically a binary representations of instances/senses and features. nb.pl ----- This is the file in which we implement naive bayesian classifier. We use both the files that we get after converting the TRAIN data and use them to assign sense values to the TEST data and check how accurately were we able to assign the correct senses. Observations of Experiments --------------------------- The following table describes the results we got from running our experiments over the various possibilities that had been given to us. -------------------------------------------- |window size | frequency cutoff | accuracy| -------------------------------------------- | 0 | 1 | 0.5442 | | 0 | 2 | 0.5442 | | 0 | 5 | 0.5442 | -------------------------------------------- | 2 | 1 | 0.7485 | | 2 | 2 | 0.7683 | | 2 | 5 | 0.7347 | -------------------------------------------- | 10 | 1 | 0.7790 | | 10 | 2 | 0.8125 | | 10 | 5 | 0.7930 | -------------------------------------------- | 25 | 1 | 0.7852 | | 25 | 2 | 0.7992 | | 25 | 5 | 0.7950 | -------------------------------------------- We notice that generally as we increase the window size, the accuracy of our naive bayesian classifier increases. Intuitively speaking, this should be expected since the more feature words we incorporate in our trials, the more the chances are that they would occur in the test data. And more the number of samples, more the probablity of assigning the correct sense. We also notice that as we increase the frequency of the words, there is a definite decrease in the accuracy. Again, intuitively, this should be expected. Becuase as we keep increasing the frequency count, if (as in our case) the stop words etc. havent been eliminated, they would have a higher chance of being the only ones which occur (since stop words like 'a', 'and', 'the' etc. appear with a lot more frequency than say a 'instrument' which would be a helpful hint in getting the sense of line as 'phone') rather than other more interesting yet a little lesser frequent words. The problem that was noticed in the previous assignment and which was subsequently fixed was that the select.pl file was picking up all the data from the files. This was fixed and after minor changes to both nb.pl and convert.pl the tests were run again giving the above mentioned results. These results fall within the acceptable range and hence the conclusion drawn that the classifier is working properly. The next step in to be done was to find interesting words in the Open Mind data and use our classifier on it. I checked the interesting words manually basing my acceptability criterion on the three things mentioned in the assignment-: 1. Has resonable rate of agreement between two taggers. 2. Has a balanced distribution of senses. 3. Has a resonable number of examples. I checked that the number of examples were atleast over a 100. Also I checked that the senses that have been tagged were of a balanced distribution. Mostly, if one sense was dominating the distribution, that word was omitted. I also looked at the rate of agreement between the taggers. Here was a complicated problem. Because, not always did the taggers agree. Also there were instances where we only had one tagged instance. In these cases the instance was taken as it is with the given sense. If there were more than one tagged instance and there was a disagreement, then the instance with the maximum number tags was chosen. If they had been tagged with equal number of each senses then the any of the senses was picked up randomly and assigned to that particular instance. I wrote a program to do all this. The name of the program is -: rawtofinal.pl It takes in as a command line argument, a specific word that we want to get the data for, and it processes the data given to us by open mind and converts it into the line data format. Usage: rawtofinal.pl [word] After doing this we get the data for that word separted into different files according to senses. The name of the file is the name of the sense contained in that file(like the line data). We can simply run our Naive Bayesian Classifier on this data and get our values. The words that were chosen from the given set of Open Mind data were-: 1. Aspect 2. Attempt 3. Author 4. Demand 5. Edge 6. Phase The test runs made on this data gave the following results-: ASPECT ------: Senses Considered: aspect10701 aspect10702 aspect10900 aspect10901 aspect12400 Results from nb.pl: Window Frequency Accuracy ---------------------------------------- 0 1 0.3627 0 2 0.3627 0 5 0.3627 2 1 0.2950 2 2 0.3442 2 5 0.3442 10 1 0.3114 10 2 0.3114 10 5 0.3770 25 1 0.3770 25 2 0.3971 25 5 0.3971 ---------------------------------------- Attempt ------- Senses Considered: attempt10400 attempt10402 Results from nb.pl: Window Frequency Accuracy ---------------------------------------- 0 1 0.7529 0 2 0.7529 0 5 0.7529 2 1 0.7741 2 2 0.8064 2 5 0.8387 10 1 0.8709 10 2 0.8709 10 5 0.8709 25 1 0.8805 25 2 0.8387 25 5 0.8387 ---------------------------------------- AUTHOR ------ Senses Considered: author11800 author11801 Results from nb.pl: Window Frequency Accuracy ----------------------------------------- 0 1 0.5806 0 2 0.5806 0 5 0.5806 2 1 0.7096 2 2 0.6501 2 5 0.6129 10 1 0.6250 10 2 0.6250 10 5 0.6250 25 1 0.6501 25 2 0.6501 25 5 0.6501 ---------------------------------------- DEMAND ------ Senses Considered: demand10400 demand10900 demand11000 demand12200 demand12600 results from nb.pl: Window Frequency Accuracy ----------------------------------------- 0 1 0.5909 0 2 0.5909 0 5 0.5909 2 1 0.5227 2 2 0.5227 2 5 0.5227 10 1 0.5909 10 2 0.5454 10 5 0.5000 25 1 0.6202 25 2 0.6202 25 5 0.6202 ---------------------------------------- EDGE ---- Senses Considered: edge10600 edge10601 edge10700 edge10701 edge11500 edge12500 results from nb.pl: Window Frequency Accuracy ----------------------------------------- 0 1 0.25 0 2 0.25 0 5 0.25 2 1 0.325 2 2 0.3 2 5 0.25 10 1 0.3 10 2 0.3 10 5 0.25 25 1 0.35 25 2 0.35 25 5 0.35 ----------------------------------------- PHASE ----- Senses Considered: phase10700 phase12600 phase12800 phase12801 results from nb.pl: Window Frequency Accuracy ----------------------------------------- 0 1 0.6060 0 2 0.6060 0 5 0.6060 2 1 0.4545 2 2 0.3939 2 5 0.4848 10 1 0.6060 10 2 0.5454 10 5 0.5151 25 1 0.6060 25 2 0.6060 25 5 0.6363 ----------------------------------------- We can observe from this data that mostly the accuracy that we get is in the very low ranges. This can be because of a lot of reasons. Some of the reasons that I can think of are as follows -: 1. The number of instances in the TEST and TRAIN data are very less. In line data we have over 4000 instances but here, for each word there would be maximum around 200 instances. The difference is evident. 2. There are instances which have been tagged only once, or in which difference between the users exists and we have to account for all of it. Hence the quality of the tagging is not as good as it can be. Hence that also brings down the accuracy level. Conclusion ---------- Hence I would like to conclude by saying that the naive bayesian classifier implemented by me gives pretty decent results for line data but gives variable results for Open-Mind data. References: ----------- Manning ,C.G. & Schutze Hinrich.2000. Foundations of Statistical Natural Language Processing.MIT Press.Cambridge Massachusetts. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ Rashmi Kankaria Assignment no : 3 and 4 Due Date : 11th nov 2002. Objective : To explore supervised approaches to word sense disambiguation. and continue with the same for Open-Mind Data. Introduction : Supervised Disambiguation uses the set of training data which is already generated for a ambiguous word by sense tagging the words in training data and this labeled data is now used to disambiguate the word in next instance where it has occurred. This experiment is an attempt to implement Naive Bayesian Classifier. Window size Freq_cutoff accuracy most common sense ---------------------------------------------------------------------- 0 1 product ( with probability 0.5409) 0 2 product ( with probability 0.5409) 0 5 product ( with probability 0.5409) ---------------------------------------------------------------------- 2 1 0.6498 2 2 0.6366 2 5 0.6307 ---------------------------------------------------------------------- 10 1 0.7598 10 2 0.7566 10 5 0.7133 ---------------------------------------------------------------------- 25 1 0.8015 25 2 0.8155 25 5 0.7931 ---------------------------------------------------------------------- In a case where window size is 0, then the sense with maximum probability , here product will be assigned as default sense. This is empirically shown to be correct. Q> What effect do you observe in overall accuracy as the window size and frequency cutoffs change ? For a given 70-30 split of the training and test data, we can observe that the accuracy goes on increasing as we increase the window_size and that is reasonable because as we increase the window size and so the context, more features will be present to disambiguate the word sense and hence more chance of the frequency of the feature for a given sense and hence higher will be the probability to guess the sense with respect to that word. As we look at the table, we can draw certain conclusions about the pattern of accuracy for a given window_size and frequency cutoff. For constant frequency cutoff , the accuracy increase with increase in window_size considerably. Eg : The accuracy increases from 0.645 to 0.7731 as we increase the window_size from 2 to 25. Thus it is easy to conclude that this will be always true that with increase in window_size, the accuracy will increase. For constant window_size, however with increase in frequency cutoff,the accuracy does not increase linearly for all window sizes as we might have expected and i think that can be argued. With lower frequency cut off , for a given window size, only one feature word is taken into consideration and that is most of the time stop word. The stop words does not give any significant information about the word in the context which is very peculiar to the word and which will help it to disambiguate. On the other hand, if the window size is to large, there can be some extraneous information ( noise) which can get added as features but which on the contrary may not be so helpful. The most significant values here are wind_size = 10 and freq cutoff = 1 and window size = 25 and freq cutoff = 5. If you observe this,the accuracy of the first one is more than the second and this can help us to deicde that optimal freq cut off and window size. The major flaw with Naive Bayesian Classifier is that it considers all the features to be indepndant. This also affects the calculations of probabilities of features. Q> Are there any combinations of window size and frequency that appears to be optimal with respect to the others ? why? As argued above, there are few combinations of window_size and frequency cut off that appear optimal with respect to others. As we can observe, for a given window size, for increasing cutoff, the accuracy decreases. Also we observe that there is no significant change in accuracy as we change the window size from 10 to 25 so optimal window_size for this case can be 10 as just increasing the window size does not make significant change in the accuracy for 2 reasons. 1. Context might be still lesser than the window_size.in this case, increasing the window size does not make any difference. 2. Also the proximate context matters to disambiguate the sense and hence having larger window might not help much. As far as the frequency cut off is considered, are most optimal number will be within 2-5 as higher frequency features are most of the time stop words and are of not any help to disambiguate and a frequency as low as 1 will output all the feature words and most of them are not relevant or occur very infrequently with the word which we are going to disambiguate.any word within the gives a range will show strong association with the word we need to disambiguate over a large training set. This gives us the optimal values of window size and frequency cutoff. Assignment 4 : For this assignemnt, as per the requirements of good words i have choosen following words and found various values of accuracy for different combinations of frequency cuttoff and window size. The file used to convert the data from Open Mind format to Line data format is : open_to_line.pl. how to run : perl open_to_line.pl aspect.n where aspect.n is used to convert the aspect data into line data format. The words I selected are : 1. Unit 2. Aspect 3. Behavior 4. Circumstance 5. Bar 6. Detail 1. Unit : There are seven senses associated with unit. Window size Freq_cutoff accuracy most common sense ---------------------------------------------------------------------- 0 1 unit%1:14:00: ( with probability 0.4095) 0 2 unit%1:14:00: ( with probability 0.4095) 0 5 unit%1:14:00: ( with probability 0.4095) ---------------------------------------------------------------------- 2 1 0.3778 2 2 0.3556 2 5 0.4667 ---------------------------------------------------------------------- 10 1 0.4222 10 2 0.4222 10 5 0.3111 ---------------------------------------------------------------------- 25 1 0.4444 25 2 0.3778 25 5 0.3111 ---------------------------------------------------------------------- 2. Aspect : There are five senses associated with this. Window size Freq_cutoff accuracy most common sense ---------------------------------------------------------------------- 0 1 aspect%1 : 09:00: ( with probability 0.4183) 0 2 aspect%1 : 09:00: ( with probability 0.4183) 0 5 aspect%1 : 09:00: ( with probability 0.4183) ---------------------------------------------------------------------- 2 1 0.4048 2 2 0.2857 2 5 0.2619 ---------------------------------------------------------------------- 10 1 0.3095 10 2 0.3810 10 5 0.3810 ---------------------------------------------------------------------- 25 1 0.4048 25 2 0.4058 25 5 0.2857 ---------------------------------------------------------------------- ---------------------------------------------------------------------- 3. Behavior : There are 4 senses. Window size Freq_cutoff accuracy most common sense ---------------------------------------------------------------------- 0 1 behavior%1:04:00 ( with probability 0.5212) 0 2 behavior%1:04:00 ( with probability 0.5212) 0 5 behavior%1:04:00 ( with probability 0.5212) ---------------------------------------------------------------------- 2 1 0.5366 2 2 0.4390 2 5 0.4390 ---------------------------------------------------------------------- 10 1 0.4146 10 2 0.3902 10 5 0.2683 ---------------------------------------------------------------------- 25 1 0.5610 25 2 0.3415 25 5 0.4146 ---------------------------------------------------------------------- ---------------------------------------------------------------------- 4.circumstance : It has 4 senses Window size Freq_cutoff accuracy most common sense ---------------------------------------------------------------------- 0 1 circumstance%1:26:01 ( with probability 0.6382) 0 2 circumstance%1:26:01 ( with probability 0.6382) 0 5 circumstance%1:26:01 ( with probability 0.6382) ---------------------------------------------------------------------- 2 1 0.5610 2 2 0.4878 2 5 0.5366 ---------------------------------------------------------------------- 10 1 0.5366 10 2 0.4390 10 5 0.4634 ---------------------------------------------------------------------- 25 1 0.6098 25 2 0.5854 25 5 0.5610 ---------------------------------------------------------------------- ---------------------------------------------------------------------- 5. Bar : It has 5 senses. Window size Freq_cutoff accuracy most common sense ---------------------------------------------------------------------- 0 1 bar%1:06:04 ( with probability 0.7678) 0 2 bar%1:06:04 ( with probability 0.7678) 0 5 bar%1:06:04 ( with probability 0.7678) ---------------------------------------------------------------------- 2 1 0.5656 2 2 0.5341 2 5 0.5656 ---------------------------------------------------------------------- 10 1 0.6138 10 2 0.5945 10 5 0.6012 ---------------------------------------------------------------------- 25 1 0.6821 25 2 0.6378 25 5 0.6076 ---------------------------------------------------------------------- ---------------------------------------------------------------------- 6. Detail : It has 5 senses. Window size Freq_cutoff accuracy most common sense ---------------------------------------------------------------------- 0 1 detail%1:10:00 ( with probability 0.5102) 0 2 detail%1:10:00 ( with probability 0.5102) 0 5 detail%1:10:00 ( with probability 0.5102) ---------------------------------------------------------------------- 2 1 0.4221 2 2 0.4178 2 5 0.4095 ---------------------------------------------------------------------- 10 1 0.4574 10 2 0.4421 10 5 0.3980 ---------------------------------------------------------------------- 25 1 0.5276 25 2 0.5132 25 5 0.4345 ---------------------------------------------------------------------- ---------------------------------------------------------------------- All the words i choose had more than three senses associated with it.This proved that the distribution was over a wide range of senses and thus more scope more even distribution. As we can see, "most" of the words had one sense much stronger than the other.Generally speaking,one sense occured with the probability of more that 0.5 while the rest of the senses together made to sunm the total probability to one. The decision to select a particular word was an interesting part of the assignement while the conversion of open-mind data into line data was rather slightly complicated because of mot of incompatibility between the two. All the words tagged were analysed properly for their number of instances,number of senses associated with it ,the distribution of senses probability and also the tagging quality.These many facotrs gave various alternatives to choose the words. As, the domain to select the word was the one which the umn-group tagged, many interesting words like act,cell etc.Also the tagging quality was found to be not very good as there was lot of disagreement amongst the taggers.Also there was an option of "unclear/unlisted" meaning which limited the choice further. However the above words were choosen peculiarly because all of many shades od senses and they have many meanings in different context.The words were tough to guess when tagged by taggers. As you can see, the accuracy of all the words choosen is on an average 40-55% which hints many hitches like a. Data for given sense,given word was less. b. There were many instances where taggers disagreed so rather than asking more taggers to tag till we decide the right sense, instances were put randomly. The probability of doing this was quiet high because most of the instances were tagged either twice or thrice. c. Also many instances chosen were having unclear/unlisted sense which needed to be ignored. This reduce the data further. In general,Niave Baysian Classifier works consistenly good for any window size and frequency cutoff however more data would have refined the accuracy. References : 1. Foundations of Statistical natural language processing by Christopher D Manning and Hinrich Schtutze. pg 235 - 239. 2. Programming Perl ( 3rd edition) by Larry Wall,Tom christiansen and Jon Orwant. +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ NAME: SUMALATHA KUTHADI CLASS: NATURAL LANGUAGE PROCESSING DATE: 11 / 11 / 02 CS8761 : ASSIGNMENT 4 -> OBJECTIVE: TO EXPLORE SUPERVISED APPROACHES TO WORD SENSE DISAMBIGUATION AND TO APPLY NAIVE BAYESIAN CLASSIFIER TO GOOD WORDS WHICH OBTAINED FROM OPEN MIND TAGGING DATA. -> INTRODUCTION: -> WORD SENSE DISAMBIGUATION: ASSIGN A MEANING TO A WORD IN CONTEXT FROM SOME SET OF PREDEFINED MEANINGS(OFTEN TAKEN FROM A DICTIONARY). -> SENSE TAGGING: ASSIGNING MEANINGS TO WORDS. -> FROM A SENSE TAGGED TEXT WE CAN GET THE CONTEXT IN WHICH PARTICULAR MEANING OF A WORD IS FOUND. -> CONTEXT FOR HUMAN: TEXT + BRAIN -> CONTEXT FOR MACHINE: TEXT + DICTIONARY/DATABASE -> MAIN PARTS OF ASSIGNMENT: -> TO SELECT RANDOMLY A% OF INPUT TEXT AND PLACE THEM IN TRAIN FILE. REMAINING TEXT IS PLACED IN TEXT FILE. -> TO SELECT FEATURES FROM THE INPUT FILE (TRAIN FILE) WHICH SATISFY A FREQUENCY CUTOFF. -> TO CREATE A FEATURE VECTOR FOR EACH INSTANCES THAT ARE PRESENT IN BOTH TRAIN AND TEXT FILES. -> TO TO LEARN NAIVE BAYESIAN CLASSIFIER FROM THE OUTPUT OF THIRD PART OF ASSIGNMENT AND TO USE THAT CLASSIFIER TO ASSIGN SENSE TAGS TO TEST FILE. -> WHEN CREATING SENSE TAGGED TEXT YOU ARE BUILDING UP A COLLECTION OF CONTEXTS IN WHICH MEANINGS OF A WORD OCCUR. THESE CAN BE USED AS TRAINING EXAMPLES. -> THE BASIC PRINCIPLE INVOVLED IN WORD SENSE DISAMBIGUATION IS TO SELECT TH EVALUE OF THE SENSE THAT MAXIMISES THE PROBABILITY OF THAT SENSE OCCURING IN THE GIVEN CONTEXT (MOST LIKELY SENSE). -> WHILE USING "NAIVE BAYESIAN CLASSIFIER " WE ASSUME THAT THE FEATURES ARE CONDITIONALLY INDEPENDENT, THEY DEPEND ONLY ON THE SENSE. -> NAIVE BAYESIAN CLASSIFIER : S=ARGMAX P(SENSE) PRODUCT( P(C(i/SENSE)) SUCH THAT i=1 TO N WHERE S IS SENSE. -> REPORT: -> WE ARE RUNNING THE PROGRAMS WITH 12 COMBINATIONS OF WINDOW SIZE AND FREQUENCY CUTOFFS USING A 70 _ 30 TRAINING-TEST DATA RATIO. WINDOW SIZE FREQUENCY CUTOFF ACCURACY 0 1 0.5311 2 1 0.6991 10 1 0.8354 25 1 0.8409 0 2 0.5311 2 2 0.6970 10 2 0.8547 25 2 0.8962 0 5 0.5311 2 5 0.6639 10 5 0.8260 25 5 0.8755 -> OBSERVATIONS: -> WHEN THE WINDOW SIZE IS ZERO, NO MATTER WHAT'S THE VALUE OF FREQUENCY, ACCURACY IS ALMOST EQUAL. -> WHEN THE FREQUENCY IS KEPT CONSTANT AND WINDOW IS INCREASED, THE ACCURACY IS INCREASING. -> WHEN THE WINDOW SIZE IS INCREASING AND FREQUENCY IS INCREASING , THE ACCURACY IS INECREASING. -> THERE IS SOME RELATION BETWEEN FREQUENCY, WINDOW SIZE AND OVERALL ACCURACY, BECAUSE THE MEANING OF A WORD CAN BE GUESSED FROM IT'S SURROUNDING WORDS. -> GOOD WORDS: -> GOOD WORDS IN OPEN MIND DATA ARE COLLECTED BY CONSIDERING DISTRIBUTION OF SENSES, AGREEMENT AMONG THE TAGGERS AND NUMBER OF EXAMPLES. -> SELECTING GOOD WORDS PROJECT INVOLVED 3 MODULES. 1. tag.pl : THIS MODULE FINDS THE DISTRIBUTION OF SENSES, AGREEMENT BETWEEN TAGGERS AND NUMBER OF EXAMPLES FOR ALL THE WORDS. COMMAND LINE ARGUMENT : perl tag.pl ACCORDING TO THE OUTPUT OF tag.pl, I GOT "author, hope, future" AS GOOD WORDS. ***author***** Distribution of author%1:18:00:: : 72 Distribution of author%1:18:01:: : 136 avgagreement = 0.904761904761905 examples : 105 ***memory***** Distribution of memory%1:09:00:: : 59 Distribution of memory%1:09:01:: : 81 Distribution of memory%1:09:02:: : 104 Distribution of memory%1:09:03:: : 25 Distribution of memory%1:06:00:: : 16 avgagreement = 0.674496644295302 examples : 149 ***art***** Distribution of art%1:04:00:: : 35 Distribution of art%1:06:00:: : 31 Distribution of art%1:09:00:: : 25 Distribution of art%1:10:00:: : 19 avgagreement = 1 examples : 110 THESE THREE WORDS HAVE BETTER DISTRIBUTION THAN OTHER WORDS, GOOD RATE OF AGREEMENT AMONG THE TAGGERS AND GOOD NUMBER OF EXAMPLES. 2. text.pl: THIS MODULE PLACES INSTANCES OF A WORD IN THE APPROPRIATE SENSE FILE OF THE WORD. command line argument : perl text.pl ids-to-sentences cs8761-umd.full.detailed goodword 3. NAIVE BAYESIAN CLASSIFIER IS USED WITH THESE SENSE FILES TO FIND APPROPRIATE SENSE FOR A WORD IN A GIVEN CONTEXT. WINDOWSIZE FREQUENCY ACCURACY author: 0 1 0.8888 10 1 0.7777 25 2 0.7777 25 5 1.0000 art: 0 1 0.3333 10 1 0.3030 25 2 0.3030 25 5 0.3636 memory 0 1 0.3478 10 1 0.3160 25 2 0.3160 25 5 0.2666 +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++ # ******************************************************************************** # experiments.txt Report for Assignment #4 Open Mind Tagging # Name: Yanhua Li # Class: CS 8761 # Assignment #4: Nov. 11, 2002 # ********************************************************************************* This assignment is to apply a Naive Bayesian Classifier to perform word sense disambiguation for Open Mind data. I carried out all experiments with 3 words --"arm", "circumstance", "manner". First we need to use sentence1.pl, sentence2.pl, sentence3.pl to convert instances in files words and sense to files with the same sense. So the commands are: sentence1.pl sense word1 sentence1.pl sense word2 sentence1.pl sense word3 After execute the command, we created arm1, arm2 , arm3, arm4, arm5, unclear and unlisted 7 files for "arm". Created circumstance1, circumstance2 , circumstance3, circumstance4, unclear and unlisted 6 files for "circumstance". Created manner1, manner2 , manner3, unclear and unlisted 5 files for "manner". We use these created file names as actual senses of instances in these files. We also need to change a little bit of code in nb.pl to give an array for senses. A change to match actual sense in nb.pl is also needed. Resulting Table for "arm" ****************************************************************** window size | frequency cutoff | accuracy 0 1 0.717948717948718 0 2 0.717948717948718 0 5 0.717948717948718 2 1 0.717948717948718 2 2 0.717948717948718 2 5 0.717948717948718 10 1 0.717948717948718 10 2 0.717948717948718 10 5 0.717948717948718 25 1 0.717948717948718 25 2 0.717948717948718 25 5 0.717948717948718 Resulting Table for "circumstance" ****************************************************************** window size | frequency cutoff | accuracy 0 1 0.525 0 2 0.525 0 5 0.525 2 1 0.525 2 2 0.525 2 5 0.525 10 1 0.525 10 2 0.525 10 5 0.525 25