Name Discrimination Data

This page contains data where ambiguous entity names in text have been disambiguated. The data has either been manually disambiguated, or created by conflating multiple names into a single ambiguous pseudo-name.

Kulkarni Name Corpus

This data was manually disambiguated by Anagha Kulkarni as a part of her M.S. thesis. It has subsequently been used in our IJCAI-2007 workshop and CICLING-2007 papers. It consists of Google search results for person names that are known to be ambiguous. The names have been manually disambiguated in this data. Please cite her thesis if you use the data:
This data is described in the following README and is available for download here.

Name Conflate Data

This is data where we have created artificial ambiguities by conflating together the occurrences of person or places names. We typically create this data from the English GigaWord corpus using the program


