We collected data from World Wide Web for the following five ambiguous 
person names:
1. Richard Alston
2. Sarah Connor
3. George Miller
4. Ted Pedersen
5. Michael Collins

For each of the above names we have performed the following step: 

Data Collection:

- Collected data using Google search engine and WebService-GoogleHack.
- Read the top 50 html/htm pages (first level). For each first level page 
also traversed the links to html/htm pages in it, if the link was
in the same web-space as the first level page.

Data Formatting and Cleaning:

- Stripped all the HTML tags from the contents of the first and second 
level pages using HTML-Format-2.04
- Divided this cleaned data into smaller chunks of text called 
contexts using a package called NameConflate. Each context contains one 
instance of the ambiguous name (head word).
- A context might contain one of the specified variants of the ambiguous 
name instead of the above mentioned names, for example: for the name 
"Michael Collins", a context might contain "Collins, M.T." or "M. Collins" etc.
- Manually removed all the occurrences of the following strings from the
contexts: 
	1. [IMAGE]
	2. --+
	3. ==+
	4. **+

Contact tpederse@d.umn.edu for additional info or questions.

------------

Anagha Kulkarni, September 2006
(This data was collected and annotated in Summer of 2006)

Ted Pedersen, January 2008
(Ted Pedersen data was modified such that sense2 was recognized as being 
the same person as sense 1, so Ted Pedersen became a 3-way ambiguity 
rather than a 4-way)