CS 8995 - Corpus Based Natural Language Processing - Spring 2001

Instructor: Dr. Ted Pedersen
Office: 309 Heller Hall
Office Hours: MWF 3-4 pm
Email: tpederse@d.umn.edu


Course Objectives:

Natural Language Processing is concerned with developing techniques that allow us to analyze, understand, and generate human language with computers. Corpus Based Natural Language Processing is based on the premise that we can use existing sources of online text to achieve these goals. This course will provide students with a theoretical and practical understanding of the techniques used to develop empirical approaches to syntactic analysis, semantic understanding, and discourse processing. Specific topics to be discussed include word sense disambiguation and machine translation. Practical work in the course will involve the design and implementation of natural language processing tools.

Required Text:

Foundations of Statistical Natural Language Processing by Christopher Manning and Hinrich Schutze. MIT Press. There is a supporting Web Site with quite a bit of information.

There is a copy of the text on 2-hour reserve in the library. You will still need to have your own copy of the text, however, this might prove useful if you forget your book, etc.

As of 12/14/00 the lowest online price I have seen for the text is $48 at bn.com. This compares to a list price of $60, which is approximately the price at the bookstore.

Reading assignments will be given in the lecture and posted here.

Suggested Texts:

We will do our programming assignments in Perl. While we will discuss Perl from time to time in the lecture, there will be a fair bit of self-study required. As such you are strongly advised to have at least one of the following at your disposal:

Learning Perl by Randal Schwartz and Tom Christiansen. O'Reilly Publishers. You can get this book from amazon.com or most any bookstore. This takes a tutorial approach and is especially good if you have limited C or Unix experience.

Programming Perl by Larry Wall, Tom Christiansen and Randal Schwartz. O'Reilly Publishers. You can get this book from amazon.com or most any bookstore. This book is more like a reference manual, although it is still very readable. This is a good choice if you have extensive C and Unix backgrounds.


This class is only open to currently enrolled CS graduate students.

Grading Basis: Programming Assignments:

Programming assignments are to be completed in Perl.

Programming Assignments must be submitted on time. Late work is not accepted and will result in a score of zero for that assignment.

We will use an automatic turnin procedure that will require you to log into hh33812 to perform. Further details can be found here .

You are expected to write your own code. If you turn in code that is not your own (e.g., code taken from a book or online archive, code written by a colleague or classmate, etc) I reserve the right to immediately dismiss you from the class.

Final Project:


Grading Scale: Exams:

All exams are closed-note, closed-book.

You must take exams at the scheduled time and place. Exams will not be given early. Make-up exams will only be offered in the event of documented personal emergencies.

By: Ted Pedersen - tpederse@d.umn.edu
Last update: 01/11/2001