
THIS IS BASED ON THE SEPT 2001 VERSION OF THE SENSEVAL1 DATA IN
SENSEVAL2 FORMAT. YOU SHOULD USE THE DEC 2001 VERSION (0.2) THAT
IS AVAILABLE AT http://www.d.umn.edu/~tpederse/code.html FOR ANY
CRITICAL EXPERIMENTS. 

The following directories and files are included in this directory.

LexSample/   -  includes a directory for each word, where the test and
		training samples as used in senseval-1 are given. 

		each directory is named word.pos, and contains the
		senseval2 data in orginal xml format:

			word.pos-test.xml       
			word.pos-training.xml   

		and with the xml tags removed:

			word.pos-test.count     
			word.pos-training.count 


fine.key 	this is a key for the senseval1 test data. please
		note that this is a key I have reconstructed from
		the scoring information provided by senseval1, and
		is not the original key. I think it is pretty close 
		however.

sensemap	supplied by senseval1 for scoring purposes

stop.list	the stop list I used in all senseval2 experiments.
		It was created by using stop.pl

token1.txt	the token definition file used to create the data in
		LexSample

token4.txt      the token definition file used during preprocessing
		of the senseval data. It consists of a single regular
		expression: /\S+/ This means that every string with
		no blank characters is considered a token. This is
		appropriate for preprocessing since we are simply	
		simply splitting up a single large xml file with all
		of the training data into separate directories for
		each word. 
