CS 8761 Natural Language Processing - Fall 2004
Assignment 4 - Due Mon, Nov 8, noon
This may be revised in response to your questions.
Last Update, Thu, Oct 28, 9am
To help create a sample of 5 paragraph essays that can be used as training
data for the class project. We will pool this data together and each team
will be able to use it for their project.
Collect 20 sample essay questions and answers from online sources. These
should be the traditional 5 paragraph form, similar to what would be
provided on GMAT, TOEFL, GRE, etc. test preparation sites. Make
sure that each essay has a specific question/prompt associated with
it, and that question must be expressed as a complete sentence.
Also, make sure that the essay adheres to the 5 paragraph form: a
thesis statement, 3 supporting paragraphs, and a conclusion. Please do
not use college admissions essays or general essays about issues. It is
very important that the essay be written to a specific prompt, so please
*do not* find a general essay and then write a prompt to fit the essay.
not use the test sites that I have provided links for on the class web
page! Please post messages to the class mailing list so that you do not
use a site already used by someone else.
Also, you should write three 5 paragraph essays of your own, based on a
the following question:
Automated essay scoring is unfair to students, since there are many
different ways for a student to express ideas intelligently and
coherently. A computer program can not be expected to anticipate all of
these possibilities, and will therefore grade students more harshly than
Discuss whether you agree or disagree (partially or totally) with the view
expressed providing reasons and examples.
These three essays should correspond to scores of 1, 3, and 6, where 1 is
terrible, 3 is average, and 6 is excellent. You can check the following
guidelines to get an idea
of how scores are assigned. Please try and write a complete 5 paragraph
essay at each level. In other words, make a good faith effort to write an
essay, pretending you are a very poor and average student. Then you can
write in your true manner, that is as an excellent student who will
always get a 6!
Each essay should be in a separate file, named with your user id and then
assigned a number from 1-23. These files should have the .txt suffix. So
for example I would create files of the form tpderse1.txt, tpederse2.txt,
etc. The essays you collect from the web should be numbers 1-20, while
your self written essays should be 21, 22, and 23.
The format of each essay file should be as follows:
SOURCE: list the url where you found the prompt and essay (as exactly as
possible) or write your name and the score (if this is one you composed).
This should only occupy one line! This should be specific enough for me to
immediately go to the essay you have found (and not just be the general
site). If the prompt and essay are at two different URLS, please provide
both (but still on just one line)
QUESTION: The question/prompt. This should only occupy one line!
Please follow this format exactly.
The SOURCE should occupy one line, and I would like to see a specific URL
that takes me directly to the essay. In other words, I would like to see
a URL like this:
NOT like this http://www.syvum.com/gmat/.
Then there should be an empty line, and then the QUESTION should also be
on one line. The essay should consist of 5 paragraphs, and those
paragraphs should have one blank line between them.
Bundle your 23 files together in a single compressed tar file, named
with your user id. This should be submitted to the web drop on the class
web page prior to the deadline.
This is an individual assignment. Please make sure that you post to
and check the mailing list archives to avoid duplicating essays from
someone else. We would like to get as broad a sample as possible.