A WordNet Stop List

What's a Stop List?

A stop list is a list of words that are excluded from some language processing task, usually because they are viewed as non--informative or potentially misleading. Usually they are non--content words like conjunctions, determiners, prepositions, etc. These are often called function words.

What's a WordNet Stop List?

Since WordNet only contains nouns, verbs, adjectives, and adverbs, you might think that a stop list wouldn't really be relevant. However, there are words that are normally used as function words that have senses (usually obscure) in WordNet.

For example, consider the humble word "at". According to WordNet, "at" is a noun that has two senses, one for the chemical element astatine and the other for a Laotian monetary unit.

It is very likely that most systems using WordNet are NOT using "at" in these senses. Thus, a WordNet stop list will list those words that are typically used as function words and yet have unrelated WordNet senses that are obscure and potentially misleading.

Finding the WordNet Stop List

This project was undertaken by Satanjeev Banerjee, and arose in the context of an implementation of Lesk's word sense disambiguation algorithm that will likely yield many interesting results. The first step was to build a list of likely stop list words. He found the following: The stop list formed based on these lists is shown here.

The next step was to determine which of these words have misleading WordNet senses, which have related WordNet senses, and which have no WordNet senses at all.

The following words are normally used as function words, but also turn out to have rather odd (but correct) senses listed in WordNet:

I, a, an, as, at, by, he, his, me, or, thou, us, who.

This is our current WordNet stop list!

You can view the senses that cause us to arrive at this conclusion here . The words in our initial stop list that have no WordNet sense are shown here . And those function words that also have WordNet senses that seem to be related are shown here .

Please let us know if you have any other candidates for membership in the WordNet stop list! These lists have been constructed using our intuitive judgements and are not meant to be taken as anything more than that!

By: Ted Pedersen - tpederse@d.umn.edu