CS 8761 Final Project Algorithm description Alianza Lima +---------------------------------------------------------+ Main idea We postulate the sentiment of a review can be obtained by examining the sentiment of the words used. Main steps 1. Use BigMac to filter out certain review words 2. Attempt to determine sentiment using information in LDOCE 3. Use the web to augment the information in LDOCE Manual priming Our algorithm is primed by assigning a positive, negative, or neutral sentiment to each "Activ code" present in the LDOCE dictionary. To be as objective as possible we only assigned a non-neutral sentiment when it was quite clear that the sentiment for a code should be so. +---------------------------------------------------------+ More Detail Step 1 Each review's words are analyzed and filtered out according to their part-of-speech (POS) tag in BigMac. Our experimentation was done by filtering out everything except for adjectives. However, there is nothing precluding a person from loosening this level of filtering. A second type of filtering is available in the form of stop-words. Step 2 Once the review words are filtered out each one is examined to determine it's sentimental impact on the overall review. Each word is then looked up in LDOCE for the senitment associated with each of it's activ codes. Each sense of a word can have a different code so we take the net sum of all the codes present for a word's contribution. In the event that a word is either not in LDOCE, or it has no activ codes available, we then use the web to determine it's sentimental impact. This is described in the next step. Step 3 In this stage we have found a word that has no sentimental information contained in LDOCE so we must create a sentiment for it. This is done by gathering a context in which the word is used and examining the sentiment of that context. To do this we query the web for a set of sentences containing the word, and then look-up each contextual word in LDOCE like we tried for the original word. The resultant sum of sentiment values becomes the sentiment for the original word. Classifying The crudest method of classification is done by examining the sign of sentiment sum for the words in the review. If they sum to a positive number then the review is classified as positive. A negative sum results in a negative classification and a zero sentiment is undecided. +-----------------------------------------------------------------+ Tweaks and alternate techniques Scaling We propose a method of "tweaking" the sentimental impact of a word by examining where it occured in the original review. We propose three methods of performing this: 1. Scale impact so words near the end of the review have the highest impact. 2. Scale it so the beginning words have the most impact 3. Scale so both beginning and end words have the most impact, but scale it so the middle words have reduced impact. The idea behind this is that it is a tendency of writing in general to summarize your intent in the end. Therefore, even a review that has positive points in the start, could be negative, and is shown by finishing the review with the negative reasons. The additional scaling options are more an experimental excercise, but should be interesting all the same. Using an 'undecided' window The crude method of assigning the final sentiment above does not lend much chance of an undecided classification. For this reason we propose allowing a window of values dictate this assignment. While this will result in a loss of recall, due to less reviews being attempted, we feel an increase of precision will result from this threshold. +------------------------------------------------------------------+