Senseval-2 System Code and Documentation

[Feb 5, 2002 - The complete Duluth systems that participated in Senseval-2 are now available. If you download and install BSP (at least v0.4), SenseTools (at least v0.1), Weka (at least v3.2.1), you can use the supporting C shell scripts (Duluth-Shell) to replicate the Duluth systems from Senseval-2. There are complete README files available with each of these packages. Please contact me with any questions or comments. tpederse AT d umn edu]

This page contains links to the code, documentation, and shell scripts used to create the University of Minnesota, Duluth systems that were used in the Senseval-2 word sense disambiguation exercise. There were three main components to these systems: The Bigram Statistics Package, SenseTools, and Weka. All of these are freely available and can be linked together via the Duluth-Shell C-shell scripts available from this page. Our objective is to make it possible for you to easily replicate the Duluth systems, and then go on to develop your own!

The Duluth systems are a combination of the Bigram Statistics Package, SenseTools, and the machine learning system Weka. The Duluth-Shell is a set of C-shell scripts that link all of these components together and can be downloaded here. In addition to the C-shell scripts, this also includes Senseval data ready for processing, and instructions telling where to find and how to install all of the various components. Consult README for a description of what is available and how to set things up.

The Bigram Statistics Package (v0.4) was used to identify interesting bigrams and co-occurrences for use as features for the learning algorithms supported in Weka. BSP is written in Perl and distributed under the GNU CopyLeft. Download it here.

SenseTools (0.1) was used to format the Senseval text for BSP processing and also to convert the output of the Bigram Statistics Package into a form that the machine learning component Weka can process. SenseTools is written in Perl and distributed under the GNU CopyLeft. Download it here or consult the README first.

All of the machine learning was carried out with Weka , a suite of Java programs that implement a wide range of machine learning algorithms. It is freely available from the University of Waikato in New Zealand. Download it here.

You can find brief descriptions of all the participating systems (including those from Duluth) here. The Duluth systems were used in the English and Spanish lexical sample tasks.

Quick Summary

You should download Duluth-Shell and BSP v0.4 and SenseTools-0.1 and Weka (at least 3.2.1) . Start with the Duluth-Shell README for an overview of the installation process. BSP and SenseTools have README files too.

Related Publications

By: Ted Pedersen - tpederse AT d umn edu