SenseTools is a package of Perl programs (and one Java program) that converts Senseval-2 formatted sense-tagged text into the arff format that is required for input to Weka, which is a suite of Java programs that implement a wide range of machine learning algorithms.

As a result, SenseTools allows you to carry out supervised word sense disambiguation experiments using any learning algorithm found in Weka (which includes decision trees, neural networks, Naive Bayesian classifiers, support vector machines, rule based learners, etc. etc.)

SenseTools converts sense-tagged text into a plain text form that can be used by the Ngram Statistics Package for identification of lexical features such as unigrams, bigrams, and collocations. It also provides programs that will extract features identified by NSP (or manually designated by the user) from the Senseval-2 formatted sense-tagged text. Ultimately SenseTools will represent the sense-tagged text in the arff format which can then be used as input to Weka.

Once Weka has learned a model, SenseTools provides a java program (WekaClassify) that classifies a set of test/evaluation data in arff format using a previously learned model. It produces as output the distribution of "scores" that shows the individual probability or confidence associated with each possible answer/classification for each instance in the test data.

SenseTools also provides a number of simple methods for creating ensembles of classifiers based on WekaClassify output, and for scoring the results of sense--tagging against a manually provided gold standard using precision and recall.

Current version (Use with Weka 3-4)

Previous versions (Use with Weka 3-2)

Related Tools


WekaClassify may be of general interest to Weka users, so we provide it in a separate distribution.

By: Ted Pedersen - tpederse AT d umn edu