The original hard and serve data (as provided by Leacock, et. al.) may be found in files original.hard.tar.gz and original.serve.tar.gz. However, the original distributions of this data includes spurious ^M characters. We have removed those. In addition, there were 4 duplicate instances in the original hard data that have been manually removed. (These are instances that have both the same instance id and context.) Thus, the data you find in hard.tar.gz and serve.tar.gz may be considered as "clean" versions of the original. In the hard data there are also instance ids that are repeated (ie instance ids are not unique, but where the contexts are unique). We have provided a program that creates unique identifiers for these cases in the original hard data and used that in creating the Senseval-1 and Senseval-2 versions of hard. The hard and serve data converted into the Senseval-1 and Senseval-2 formats is found here: hard-S1.tar.gz hard-S2.tar.gz serve-S1.tar.gz serve-S2.tar.gz You can find more information about the hard and serve data in: Leacock, Chodorow and Miller (1998) Using corpus statistics and WordNet relations for sense identification. Computational Linguistics 24:1 (Please credit Leacock et. al. as the source of this data. I am simply distributing it and had no role in its creation.) Last updated May 01, 2003 by TDP