This is a directory of software developed by the
Natural Language Processing Group
at the University of Minnesota, Duluth. It is mostly in Perl, and
under the terms of the GNU General Public License
Many of these projects are available via
Unsupervised Corpus Based Clustering of Similar Contexts
SenseClusters is a package of Perl programs that allows a user to cluster
similar contexts together using unsupervised knowledge-lean methods. These
techniques have been applied to word sense discrimination, email
categorization, and name discrimination.
NSP allows you to identify word n-grams in large corpora using
standard tests of association such as Fisher's exact test, the log
likelihood ratio, Pearson's chi-squared text, and the Dice Coefficient.
WordNet::Similarity allows you to measure the similarity and relatedness
of two concepts in the WordNet lexical database using a variety of
measures of semantic similarity and relatedness.
WordNet::SenseRelate allows you to assign meanings to each content word in
a text. It does this by determining which sense of a word is most
related to its neighbors.
A few misc. programs that help us deal with WordNet.
UMLS::Similarity allows you to measure the similarity and relatedness of
two concepts in the Unified Medical Language Subsystem (UMLS) using a
variety of measures of semantic similarity and relatedness.
UMLS::Interface provides a Perl interface to the Unified Medical
Language System (UMLS) and provides much of the functionality that
Supervised Methods of Word Sense Disambiguation
This is a suite a tools that allow for easy creation of supervised word
sense disambiguation experiments.
This is a greatly improved version of the Duluth-Shell as used in the
DuluthX Senseval-2 systems. It makes it easier to run large numbers of
experiments, and provides many detailed reporting options.
This extends the Duluth Senseval-2 systems with part of speech and
syntactic features. This system participated in Senseval-3 (2004).
Complete source code and documentation for the Duluth systems that
participated in the Senseval-3 (2004) comparative exercise among word
sense disambiguation systems. This includes supervised lexical sample
systems based on the Duluth Senseval-2 systems, and a new unsupervised
lexical sample system.
Complete source code and documentation for the Duluth systems
that participated in the lexical sample tasks of Senseval-2 (2001)
comparative exercise among word sense disambiguation systems. These
systems rely on lexical features like unigrams, bigrams, and
This is a complete word sense disambiguation system that
integrates NSP and Weka into the Gate environment.
This is a complete word sense disambiguation system that assigns senses
to biomedical text based on the UMLS.
Data and Data Creation Tools
We support conversions of data in a number of formats into the
Senseval-2 format for lexical sample word sense disambiguation. You
can find those tools here!
We have converted a variety of sense-tagged text into the Senseval-2
format. We provide both copies of the converted data
as well as the source code used to create it.
Process Senseval-2 formatted data using the Brill POS Tagger and
the Collins Parser.
Tools for automatic and manual alignment of parallel text.
GoogleHack finds sets of related words using the Google search engine.
- tpederse AT d umn edu