CS 5761 - Introduction to Natural Language Processing

Programming Assignment 2 - Demo in Lab on Monday, Feb 11 at 4pm
(submit code via email to patw0006@d.umn.edu before lab, and bring written work to the lab)

Objectives

To gain experience with Finite State Automata and Perl regular expressions.

Specification

Design a Finite State Automata that will accept any expression that refers to a specific date (month, day, and/or year) and another that will accept any expression that refers to a specific time of day, regardless of how formally (e.g., the fifth day of March, eleven in the morning) or concisely (e.g. Mar 5, 11am) it may be expressed. Refer to problems 2.4 and 2.6 in the Jurafsky and Martin text (page 54) for additional description.

Use these FSA's as the basis of a Perl program that puts tags around dates and times in text. This program should use regular expressions extensively. Remember that any FSA can be equivalently expressed by a regular expression, so you should develop your FSAs first and then convert them into regular expressions. The Perl regular expression will take care of a lot of the work in this assignment, if you allow it to do so!

Time expressions should be marked by [time] and [/time], and date expressions by [date] and [/date]. For example,

INPUT:
I was born at midnight on March 15. My sister will arrive at 4:00 am on Monday, June 30. I can't remember if my father was born in April 1943 or 1944. Where were you at 1 o'clock on 11/21/00?

OUTPUT:
I was born at [time] midnight [/time] on [date] March 15 [/date] . My sister will arrive at [time] 4:00 am [/time] on [date] Monday, June 30 [/date] . I can't remember if my father was born in [date] April 1943 [/date] or [date] 1944 [/date] . Where were you at [time] 1 o'clock [/time] on [date] 11/21/00 [/date] ?

Your program will be demoed on text from the Wall Street Journal similar to this. Please turn in your FSA diagrams in the lab. You may do this in pencil and paper, but it should be legible. Make sure that all edges are labeled.

Policies (from syllabus)

All programming assignments and your project will be demonstrated during designated lab sessions. You should also submit an electronic copy of your source code to the TA prior to the designated demo session. (His email address is patw0006@d.umn.edu.) There is no other way to submit your programming assignments or project. Failure to submit AND demo on time will result in a zero.

Any code you submit should be commented. I must be able to understand what your code does simply by reading the comments. This understanding should extend down to the details of your code. So do not simply describe the input and output, also include comments that describe your particular algorithm and coding techniques. Failure to comment to this degree will result in a zero.

All assignments and the project are to be done individually. You are required to write your own code. Unless otherwise specified, you must only turn in code that you personally wrote. The only possible exception to this is if I tell you to use a module that is available in a book or online archive. However, I will clearly indicate when this is permissible. Violations of this policy will result in severe grading penalties and/or failure in the class.