CS 8761 Natural Language Processing - Fall 2004
This may be revised in response to your questions.
Last Update, Wednesday, Nov 3, 12 noon
The overall goal of your team project is to develop an automatic method of
grading essays. These are essays as found on standardized tests such as
the TOEFL or GMAT exam, where a question or prompt is given, and the
student is expected to write a 5 paragraph essay in response. The five
paragraphs are typically a thesis statement, three supporting paragraphs,
and one concluding paragraph.
Your essay grader should be written entirely in Perl. It may include
standard Perl modules that can be found in the CPAN archives. You should seriously
consider releasing your system via CPAN as well. Your team may also
want to use the SourceForge
development environment, which will provide web hosting and cvs
Please organize and
document your system in such a way that I can easily download and
install your system, and have it running with minimal effort on my
part. In particular, I would like you to follow the standard Perl
convention of having systems installed via the "standard 3 step install"
We will make these systems available on the web both in a
downloadable form (after the class is finished) and also for web based
use. You may be surprised at who and how many people will look at
these system, so please consider it an opportunity to make a good
impression (and a good advertisement) for yourself and your teammates.
Your essay grader should have a cgi based web interface that will display
a question to the student, and then provide a box in which the essay can
be written and then submitted. Your system should provide feedback on the
Your system should provide a report to the user after they submit an
essay, that provides feedback on all of the issues mentioned above.
- Gibberish Detection - (aka Word Salad Detector) - your system should
be able to identify when a student has submitted lists of words or other
non-sentence like input. For example, the following would be an example of
the kind of gibberish you should be able to detect: dog cat mouse hot
warm cold isn't it you i am a lunatic maybe house house dog dog
- Identify Irrelevant Text - the student will be given a particular
question to answer. Your system should be able to identify if a sentence
or paragraph is irrelevant to the question being answered.
- Fact Finder - your system should be able to identify statements of
fact versus opinions. For example, it should recognize that George
Washington was the first President of the USA as a fact, while
George Washington was a great man is an opinion.
- Fact Check - your system should be able to verify (or refute) simple
statements of fact, such as Abraham Lincoln was a great ruler of