The deadline for assignment 2 is 4pm Monday Feb 12. This deadline 
will be strictly enforced. Only submit via turnin, do not send email.

---

What follows is a description of the information expected in your
write up for assignment 2. Remember that this is to be placed as a
comment at the beginning of your perl code:

0) USAGE NOTES

Show me how I should run your program. 

eg.
userid.pl '/regexa/' '/regexb/' inputfile

I will follow your example exactly. If your example is incorrect or
unclear and does not work, I will not attempt to figure it out, therefore
your submission will not be graded.

I will do my testing on csdev?? so make absolutely sure your program runs
on these machines as an EXECUTABLE SCRIPT following your example. 

1) GENERAL APPROACH

Begin your writeup with a clear definition of how you are performing
tokenization in your language. In other words, how are words defined?
The two regular expressions provide criteria, but the question remains
as to how to tokenize what doesn't match the regular expressions.

There are two likely tokenization schemes (although you are free to
use others as long as you describe them clearly).

1) A word token is a string that matches regexA, regexB, or is a 
string of alpha characters delimited by space/s. 

2) A word token is a string that matches regexA, regexB, or is a 
single alpha character or space. 

Your definition must be clearly stated, and your program must follow
that definition!! 

Discuss any other assumptions you make about regarding unusual
situations that may arise as a consequence of your tokenization scheme. 

Also describe your general implementation strategy - did you make a
single pass through the data, did you use any features in Perl that
you found interesting or helpful in resolving this assignment.

2) TEST CASES: 

Provide frequency counts for ALL of the two word sequences that 
your program will identify, based on the regular expressions and 
sample text shown below using your tokenization scheme described in 1). 

Test Case 1:

/ interest(ing|ed|s)? / /in/
my interest is in interesting articles on indiana and the indus river

Test Case 2:
/.../ /[aeiou]/
where in the world is carmen san diego

Test Case 3:

/what / /(is|are|am)/
what island is he thinking i am on

List ALL the two word sequences from the given text based on the 
tokenization scheme you describe in 1). This means not only the two
word sequences that match the regular expressions but also those 
that do not. 

3) PROBLEMATIC CASES - Describe three particularly challenging 
cases that you have found troublesome. Include example regular
expressions and text to be matched (similar to 2) that will illustrate   
the problem. Describe how you chose to resolve this problem. 

Please address all the areas as I describe above. These should be 
included as a single, written text that precedes your program. I will
read your explanation first and then verify that your program does
as you say it should. If I am unable to follow your write-up I will
not attempt to run your program (as I would just be guessing as to
what you intended) so please make your statement as clear as possible.

Good luck!