CS 8995 - Corpus Based Natural Language Processing - Spring 2001

Class Information:

Instructor Office Hours
Required Readings (week by week)
Program Samples (example code from lecture)
Final Project Teams (stage 1) (curious about the team names ?)
Final Project Teams (stage 2)

Final Project:

Stage 1 Sentence Alignment, Gold Standard Data due Friday March 23, 4 pm, the rest is due Monday March 26, 4pm. As of Mar 20 the evaluation requirements have been expanded! Please make sure to include the new information!
GOLD STANDARD DATA posted (3/23)

Stage 2 More Sentence Alignment, due Monday April 16
Consider using some of the bitext data now available from Assignment 4 for testing purposes. See below.
GOLD STANDARD DATA updated (4/16)

Stage 3 Building a Translation Dictionary with the EM algorithm (Optional extra credit) due Thursday May 10

Programming Assignments:

All programming assignments should be turned in using turnin on machine hh33812.
Unless specified otherwise, assignments are to be completed individually. Here is a reminder about that policy.

Assignment 1 Mutual Information, due Wed Jan 31, 4 pm
Solution Key for this text with N=10.

Assignment 2 pointwise Mutual Information, due Mon Feb 12, 4 pm
An analogy of sorts that may provide a little guidance. A few more thoughts . Preliminary info about the write-up . Even more info about the write-up .

Assignment 3 N-gram models, due Mon Feb 26, 4 pm
Solution Key using Witten Bell Smoothing and text1 and text2 .

Assignment 4 parallel corpus collection, due Wed Mar 07, 4 pm. A reminder about our objectives.
Here's the bitext that you created! (posted 4/3/01). A note on how to use it for stage 2.
Further details on assignment 4 grading, as well as EXTRA CREDIT OPPORTUNITY.

Perl Resources:

Sources of Text:

Other Resources:


Lecture meets MW 4-5:40 pm in HH 302.

By: Ted Pedersen - tpederse@d.umn.edu
Last update: 1/21/2000