Senseval-3 System Code and Documentation

This page contains links to the code, documentation, and shell scripts used to create the University of Minnesota, Duluth systems that were used in the Senseval-3 word sense disambiguation exercise.

The Duluth systems that participated in the supervised lexical sample tasks for Senseval-3 were based on the Duluth systems that participated in Senseval-2. The Duluth system that participated in the unsupservised lexical sample task was a new system which was at that time known as SenseRelate (version 0.5) but has since been superceded by WordNet::SenseRelate::TargetWord.

In addition, Syntalex is a system that participated in Senseval-3 that extends the Duluth Senseval-2 system by incorporating part of speech features and syntactic features. This was developed by Saif Mohammed as a part of his M.S. thesis.

Duluth Unsupervised Lexical Sample System (Duluth-LSU)

This system is based entirely on the WordNet::SenseRelate::TargetWord , which uses WordNet::Similarity to measure the relatedness between a target word an its neighbors. There are a few simple driver scripts (Duluth-SR) that will run the algorithm on all the Senseval-3 words that you can download here.

Quick Summary: You need to install WordNet::Similarity, WordNet::SenseRelate::TargetWord and the Duluth-SR drivers mentioned above. Note that Duluth-SR refers to SenseRelate version 0.5 - that has since been renamed as WordNet::SenseRelate::Targetword.

Supervised Lexical Sample Systems (Duluth-xLSS)

There were three main components to these systems: The Ngram Statistics Package, SenseTools, and Weka. All of these are freely available and can be linked together via the DuluthShell (v0.3) C-shell scripts available from this page. Our objective is to make it possible for you to easily replicate the Duluth systems, and then go on to develop your own!

The DuluthShell was developed for Senseval-2 and re-used in a modified form for Senseval-3. In particular duluth3 and duluth8 were re-used. The Duluth Shell can be downloaded here. In addition to the C-shell scripts, this also includes Senseval data ready for processing, and instructions telling where to find and how to install all of the various components. Consult the README for a description of what is available and how to set things up.

The Ngram Statistics Package (v0.69) was used to identify interesting bigrams and co-occurrences for use as features for the learning algorithms supported in Weka. BSP is written in Perl and distributed under the GNU CopyLeft. Download it here.

SenseTools (0.3) was used to format the Senseval text for NSP processing and also to convert the output of the Ngram Statistics Package into a form that the machine learning component Weka can process. Download it here.

All of the machine learning was carried out with Weka , a suite of Java programs that implement a wide range of machine learning algorithms. It is freely available from the University of Waikato in New Zealand. Download it here.

Quick Summary: You should download Duluth-Shell v0.3 and NSP v0.69 (or better) and SenseTools-0.3 and Weka (at least 3.2.1, but note that you will need the new version of WekaClassify (a SenseTools component) if you use 3.4 or better). Start with the Duluth-Shell README for an overview of the installation process. NSP and SenseTools have README files too.

Related Publications

By: Ted Pedersen - tpederse AT d umn edu