CS 8761 Natural Language Processing - Fall 2002 - MRD + Web => Corpus

Final Project - Stage I - Beta version due, Mon Nov 18, noon

This may be revised in response to your questions. Last update Mon Nov 4 7:00 pm

Objectives

Develop Perl modules that provide an interface to LDOCE (Longman's Dictionary of Contemporary Language) and the Macquarie Dictionary (aka Big Mac). Your team should develop a separate and independent interface to each dictionary.

Specification

Design and Implement Perl modules that provide interfaces to LDOCE and Big Mac. The raw data and documentation for these dictionaries is available at /home/cs/tpederse/CS8761/LDOCE3 and /home/cs/tpederse/CS8761/BigMac.

These modules should be separate and independent. You must design these so that they are suitable for distribution via CPAN . This includes providing documentation using perldoc and following the standard method of installing modules. I will expect to install and use your code as a Perl module just like any other Perl module I might get from CPAN.

Your modules should be to LDOCE and Big Mac as WordNet::QueryData is to WordNet. In other words, provide a Perl interface to the dictionary that returns the useful information present in the dictionary so that it can be used in a Perl program. Please consider the interface as your only way to get information from the dictionary.

QueryData is a Perl module and we have it (along with WordNet) installed on our system. Please experiment with a bit to see what sort of capabilities it has, and use it to get some ideas of the kinds of capabilities you might want to support. Please remember of course that WordNet, LDOCE and Big Mac are all quite distinct and contain different kinds of information that is stored in different ways.

When your team thinks about the type of capabilities that you will need to support in your interface, assume that your interface is the only way to access the data in the dictionary. Make sure you don't overlook any significant sources of information provided in the dictionary. This might require that you spend a little time looking at the structure of the dictionary and determine what is provided and what is not.

Documentation

Provide sufficient documentation (to be read via perldoc) such that I can install your module and start to use it without much difficulty. Imagine that I have limited experience with LDOCE and Big Mac and that I am not able to view the source sgml files. Your interface is my only way to see and access that data. Your documentation is my only source of information about your code and its capabilities.

Submission Guidelines

Submit your modules to the web drop in two distinct tar files named for your team. Once unpacked things should be structured such that I can install your module using the standard "3 step CPAN" install. The three steps are as follows.
perl Makefile.PL
make
make test
I should not have to do anything else to get each of your modules installed. I should be able to include it in a program via the use command. Please provide some example usages in your documentation, and of course your test files will show how to use the code as well (see QueryData again as an example).

Make sure you team name, individual names, date, etc. are included in your source code. Your code may well end being distributed via CPAN so provide appropriate info about copyrights, distribution, etc.

Submit your LDOCE and Big Mac interfaces separately. Only submit 2 per team. Coordinate with your teammates so you don't have multiple submissions.

This is a team assignment. You are strongly advised to divide up the work of the project into tasks that can be carried out in parallel by various team members. All team members should be acknowledged in the comments, etc. and all teammates will receive the same grade. Do not work with other teams. Each team should operate independent of all the other teams. Make your own decisions as a team and do not be influenced by the decisions of other teams if you happen to hear of them accidentally. You are free to work with your teammates as closely as is necessary.

by: Ted Pedersen - tpederse@umn.edu