CS 8761 Natural Language Processing - Fall 2004

Assignment 4 - Due Mon, Nov 8, noon

This may be revised in response to your questions. Last Update, Thu, Oct 28, 9am

Objectives

To help create a sample of 5 paragraph essays that can be used as training data for the class project. We will pool this data together and each team will be able to use it for their project.

Specification

Collect 20 sample essay questions and answers from online sources. These should be the traditional 5 paragraph form, similar to what would be provided on GMAT, TOEFL, GRE, etc. test preparation sites. Make sure that each essay has a specific question/prompt associated with it, and that question must be expressed as a complete sentence. Also, make sure that the essay adheres to the 5 paragraph form: a thesis statement, 3 supporting paragraphs, and a conclusion. Please do not use college admissions essays or general essays about issues. It is very important that the essay be written to a specific prompt, so please *do not* find a general essay and then write a prompt to fit the essay.

Please do not use the test sites that I have provided links for on the class web page! Please post messages to the class mailing list so that you do not use a site already used by someone else.

Also, you should write three 5 paragraph essays of your own, based on a the following question:
Automated essay scoring is unfair to students, since there are many  
different ways for a student to express ideas intelligently and  
coherently. A computer program can not be expected to anticipate all of 
these possibilities, and will therefore grade students more harshly than 
they deserve.

Discuss whether you agree or disagree (partially or totally) with the view 
expressed providing reasons and examples.
These three essays should correspond to scores of 1, 3, and 6, where 1 is terrible, 3 is average, and 6 is excellent. You can check the following guidelines to get an idea of how scores are assigned. Please try and write a complete 5 paragraph essay at each level. In other words, make a good faith effort to write an essay, pretending you are a very poor and average student. Then you can write in your true manner, that is as an excellent student who will always get a 6!

Output

Each essay should be in a separate file, named with your user id and then assigned a number from 1-23. These files should have the .txt suffix. So for example I would create files of the form tpderse1.txt, tpederse2.txt, etc. The essays you collect from the web should be numbers 1-20, while your self written essays should be 21, 22, and 23.

The format of each essay file should be as follows:

SOURCE: list the url where you found the prompt and essay (as exactly as 
possible) or write your name and the score (if this is one you composed). 
This should only occupy one line! This should be specific enough for me to 
immediately go to the essay you have found (and not just be the general 
site). If the prompt and essay are at two different URLS, please provide 
both (but still on just one line)

QUESTION: The question/prompt. This should only occupy one line!

ANSWER: paragraph-1

paragraph-2

paragraph-3

paragraph-4

paragraph-5
Please follow this format exactly.

The SOURCE should occupy one line, and I would like to see a specific URL that takes me directly to the essay. In other words, I would like to see a URL like this: http://www.syvum.com/cgi/online/serve.cgi/gmat/awa/issue_004.html NOT like this http://www.syvum.com/gmat/.

Then there should be an empty line, and then the QUESTION should also be on one line. The essay should consist of 5 paragraphs, and those paragraphs should have one blank line between them.

Submission Guidelines

Bundle your 23 files together in a single compressed tar file, named with your user id. This should be submitted to the web drop on the class web page prior to the deadline.

This is an individual assignment. Please make sure that you post to and check the mailing list archives to avoid duplicating essays from someone else. We would like to get as broad a sample as possible.

by: Ted Pedersen - tpederse@umn.edu