EGAL - Essay Grading and Analysis Logic

The Project can be accessed online at http://egal.sourceforge.net

Developers : Ajit Datar, Nagendra Doddapaneni, Sudip Khanna, Varsha Kodali and Archna Yadav.


INTRODUCTION TO AUTOMATIC ESSAY GRADING

As it is said, the three R's of education are Reading, wRiting and aRithmetic. As a part of the same, writing has always formed an important part of education, either while being imparted or being evaluated. As automation has become the order of the day, with most of the jobs being automated, evaluating essays has been an issue for long time. Since late 1960s, systems have been developed to evaluate essays automatically. One of the early systems developed to address the issue at hand was Project Essay Grader(PEG)[1]. This system was not accepted widely, because of the fact that most of the factors that this system considered were indirect, like number of commas, number of prepositions and number of uncommon words. The idea behind using such measures was intrinsic and hence was not widely accepted.

After PEG, in the 1980s there was an essay evaluation system called Writer's Work Bench tool(WWB) which helped giving feedback for writers about various aspects such as spelling, diction and readability. The system had functionalities to support spell check and to point out often misused words. But because WWB did not take care of semantic analysis, it was limited in scope - in terms of evaluation. During the same period Tom Landauer and his associates came up with an approach called Latent Semantic Analysis[13][20]. This Latent Semantic Analysis (LSA) based techniques use a ``bag of words'' approach, wherein the word order is not considered important, but similarity between words is and is measured using a co-occurrence matrix and singular value decomposition. Tom Landuer and his associates furthered research into the approach and developed the Intelligent Essay Assessor(IEA)[14] system.

Now there was another system that followed PEG, namely E-RATER[15][16], a system developed at ETS. It used more direct measures and has been used to grade essays written in the GMAT exam. The measures used by E-RATER are more comprehensive and include the document structure. Thus the students' response can be measured for similarity with a training set of essays and based on this similarity value, given a score on a point scale from 1-6,

Later, in 1999 Christie architectured a new essay grading system SEAR (Schema Extract Analyse and Report)[17] which was based on a new parameter, the style of the essay rather than content. It required an initial training and calibration on a set of essays. Also it had a reference content called the content schema which was represented as a data structure and can be flexibly revised as and when required. SEAR had inspired others to consider the style of the essay, though it had its pros and cons. In the following year, three students Ming, Mikhailov and Kuan at NGEE ANN polytechnic came up with an essay grading system called Intelligent Essay Marking Systems (IEMS)[18], which was based on the Pattern Indexing Neural Network(Indextron), a clustering algorithm that can be implemented over a neural network. IEMS was accoladed for its instant feedback to the student in a way by which students can learn where they had performed well and also the specifics of their mistakes. After three years of research and development, Mitchell, Russel, Broomhead, and Aldridge in 2002 came up with a software called Automark[19] intended for marking answers to open-ended questions. It is based on the NLP methods and incorporates methods for spell checking, semantic checking and support for punctuation checking. It looks for specific content in the free-text answers. Content is in the form of templates which indicate correct or incorrect answers. Another step in the direction of automatic text assessment.


OBJECTIVE

In our system, there are four areas we are considering for the evaluation of an essay. They are Gibberish Detection, Irrelevance Measure, Identifying Statements of Fact and Checking statements of fact for their accuracy.

Let us discuss each of these modules one after the other. To start off, there can be two types of gibberish sentences. Syntactic gibberish sentences and semantic gibberish sentences. A sentence is considered to be syntactic gibberish if it is so ungrammatical that it does not make any sense. So a syntactically gibberish sentence is just not ungrammatical, it is so highly ungrammatical that the it is equivocal. Let us consider the following sentence ``I go the market.'' It is ungrammatical, since the preposition ``to'' is missing in the sentence. Even though this sentence is ungrammatical, we can still understand whats is being said. Now consider the sentence ``At flying no market ran.'' This too is ungrammatical but it does not make any sense. Hence, it is considered to be syntactic gibberish. Now let us consider what semantic gibberish sentences are. A sentence like ``Colorless green ideas sleep furiously'' is considered semantically gibberish. Such sentences are grammatical but do not have any meaning. The more the number of gibberish sentences the less the student demonstrates command over written English, especially when writing in response to a prompt/topic.

Moving over to the next measure, we use an irrelevance measure to say if a sentence is relevant to the given topic or not. Given a topic/prompt for the essay, a sentence from the student's response is considered to be irrelevant if the sentence does not relate to the topic. Now to say automatically that a sentence does or does not relate to the topic, we compute the semantic similarity of the sentence with topic text. We use WordNet to find this similarity. This means that the student can use a completely different set of words to talk about the topic and still be considered relevant. For example say the prompt asks the student to write about the ``Importance of class room teaching versus learning from home'' and the student writes ``Education is very important for one to succeed in life.'' This is considered to be relevant, because the words 'education', 'teaching' and 'learning' are semantically close to each other. Now if the student responds by writing ``The sun is the biggest star in the universe'', then the sentence is irrelevant, because it does not talk about the topic. The more the number of irrelevant sentences the less the student shows an understanding of the topic.

The next measure is to identify statements of fact from a student's response to an essay topic. A sentence is considered to be a fact against an opinion, if the following four properties hold good (i) Information presented Unique and One of a kind (ii) Information presented Concrete (iii) Information presented a Statement or Proposition and (iv) Information presented an association between more than one concept[2] So if the sentence contains such information, we can identify the statement to be a fact. Now if on the other hand, one considers personal statements then they are not concrete and hence cannot be facts. Such personal statements would be opinions. Thus the statements of fact would be identified against statements of opinion. Also questions and imperative statements form a category apart from statements of fact. If a sentence contains comparative adjectives like 'smarter' or superlative adjectives like 'smartest', or if the sentence contains comparative adverbs like 'faster' or superlative adverbs like 'fastest' then the sentence is not a statement of fact, because such a sentence would not be concrete. Also, sentences about future are not a proposition, as there is an element of uncertainty related with future, and hence such sentences cannot be statements of fact. Let us consider an example. If the topic was the same as given in the previous section about teaching, and if the student response was ``Learning from home started in USA from 1965'' is considered to be a statement of fact, irrespective of whether the statement is relevant or not. The statement contains information which is unique(1965), concrete ('something starting'), and has no words that can be interpreted in more than one way in different contexts. Now if there was a sentence like ``Learning in class rooms is good'', it is not a statement of fact, but a statement of opinion, because what seems to be 'good' to one person, might not be the same for another. Thus personal statements cannot be considered as statements of fact. Thus the more such facts are stated, the better we evaluate what the student envisions about the topic at hand, by relating the topic with concrete statements i.e. facts.

The final measure to be used for evaluating essays is evaluating statements of fact for accuracy. Now that we have identified the statements of fact, we check to see if those facts are accurate and if they are, they contribute to a boost in the score. Let us consider the topic from the last measure about learning. If the student writes ``Class room teaching started in 1965 in the USA'' and say in fact that it started in 2000. Then although the statement is a fact, it is not accurate. Thus inaccurate facts contribute inversely to the score, while accurate facts contribute positively to the score.


IMPLEMENTATION OF BASELINE APPROACH

The four measures implemented are Gibberish Detection, Irrelevance Measurement, Identifying statements of fact and checking for Accuracy of Statements of Fact. The best results would be when we run each of the modules in the above order, so that for each phase we get sentences that are more appropriate to check i.e. if a sentence is gibberish, then there is no point in checking to see if it is relevant or not, and if a sentence is irrelevant then there is no point in checking to see if it is a statement of fact or not.

Let us look at the implementation of each of the four modules -

The EGAL::Gibberish module implements Gibberish Detection. The method implemented is

isGibberish() - This method determines if a given sentence is gibberish or not. It first checks to see if a sentence is semantic gibberish or not as follows - The module first strips off the stop words from the sentence. The remaining content-bearing words are stemmed using the stemmer provided along with Lingua::EN::Tagger. Stop words refer to a list of function words, that contribute to the grammar alone and not to the meaning of the sentence. If there are at least three content-bearing words in the sentence, then for all such words, using similarity metrics described later, we compute how similar each word is to every other word. Thus if we have N words in the sentence, we end up with a NxN matrix. Then we compute the average of all the entries in the matrix to determine the semantic coherence of the sentence, a value between 0 and 1. The compliment of this value scaled between 0 and 100 is the percentage gibberishness of the sentence. If the scaled score is above an empirically decided threshold of 90%, then the sentence is flagged as semantic gibberish.

Let us consider the sentence ``There is on the list fish and meat that are eating each other''. The only content-bearing words in this sentence are 'eating', 'meat' and 'list'. The semantic similarities between these words are eating-meat - 0.1581 eating-list - 0 meat-list - 0

This gives us a value of 0.0527 for the semantic coherence of this sentence. Percentage gibberishness is calculated as (1-0.0527)*100, i.e. 95.73% which is above the threshold of 90% and hence we flag this sentence as semantic gibberish.The threshold we have used has been experimentally derived. After working on a list of sentences which were assessed by humans as gibberish/non-gibberish, the system was tuned to be consistent with the aforementioned judgments, by setting the threshold accordingly. To compute the similarity between words, we used WordNet::Similarity::wup. The measure has been used for words found in WordNet. For those words not found in the knowledge base, a similarity value of 0 is used, which results in a reduction of semantic coherence and therefore an increase in gibberishness, which is complimentary to semantic coherence. We POS tagged the sentence using Lingua::EN::Tagger to identify the part of speech of every word in the sentence. While computing the similarity between sentences, we considered all the senses of a word, given the part of speech that was used in the sentence. Now as we know the part of speech of a word, we would be able to distinguish the sense of the word ``I'' as used in the sentence ``I am going there.'', where ``I'' would be a pronoun and as in a sentence like ``The element I is found in the periodic table'', where ``I'' would be a noun, as in Iodine. This would enable us to handle some of the surprising words found in WordNet, based on the part of speech of that word in the sentence.

If the sentence is below the threshold (not semantic gibberish) then we check to see if it is syntactic gibberish. To determine if a sentence is syntactic gibberish, we do the following - If the percentage of unknown words and unused links (score) in the sentence is more than an empirical threshold then we say that this sentence is syntactic gibberish. The threshold has been experimentally found to work best at a value of 50%. A syntactic gibberish sentence would have lot of unused links according to Lingua::LinkParser, a grammar parser used to parse sentences. Based on this, if the score is 0 then the sentence is grammatical. If it is a relatively small value it is ungrammatical but not gibberish and if it is high, then the sentence is considered to be syntactic gibberish. This method takes in a sentence from the students response for a topic as input and returns an array, with the zero index contains a 1 if the sentence is gibberish and a 0 otherwise. At index 1, the array contains the gibberish score associated with the sentence and at index 2, the array contains detailed trace of execution.

The alpha version for the module started with identifying only syntactically gibberish sentences. To flag a sentence as syntactic gibberish, we considered an absolute value of the number of unknown words and unused links. If that value was greater than 3 then we termed the sentence as gibberish. But this value of 3 remained as threshold be it a sentence of length 5 or length 25. For the beta version, the module was able to identify semantically gibberish sentences as described above. An improvement over beta version, has been achieved in the final version, with modification of the absolute threshold for syntactic gibberish sentences. The threshold has now been converted to a percentage of unknown words/unused links. Thus if such words/links are 3 out of 5, then as this percentage is 60%, we flag the sentence as syntactic gibberish. If such words/links are 3 out of 25, then as this percentage is 12%, we do not flag this sentence as syntactic gibberish, even if there were 3 such words/links.


The EGAL::Relevance module implements Relevance Measure. The method
implemented is

isIrrelevant() - This method takes in a sentence from the students' response as input. It then computes the relevance of the sentence with the topic/prompt. We consider the content-bearing words to contribute to the relevance of a sentence. Now we consider each of the content-bearing words from the input sentence and we compare each such word with every word in the topic. So for every word in the topic, we find the maximum similarity that such a word has with any word in the input sentence, using WordNet::Similarity::wup. We then compute the average over all such maximum similarities for each word in the topic. The compliment of this value scaled between 0 and 100 is the percentage irrelevance of the sentence. If this score is above an empirically decided threshold then the sentence is considered to be irrelevant. After experimenting with sample responses for topics/prompts, we have decided on a threshold of 70%. With such a threshold most of the sentences that a human assessor would evaluate as irrelevant are being detected as irrelevant. This method returns an array with its zero index returning a 2 if the sentence is irrelevant and a 0 otherwise. The array index 1 contains the irrelevant score between 0 and 100. The array index 2 contains the detailed execution trace.

The alpha version for the module started with comparing words from the student's response to words in the topic and ``gold standard essay''. This list of ``gold standard essays'' have been written by an expert in the field, for every possible essay topic that the system would handle. This word match also considered synonymous words by using Thesaurus.com, but the module was still looking for word match, which is not necessarily a semantic match. Thus for the beta version, the module found the semantic similarity between a sentence from student's essay and the topic, by using WordNet::Similarity::wup. Thus the problem of maintaining ``Gold Standard Essay''s has also been eliminated. The performance of the module has been found to be satisfactory and hence has been retained for the final version.

The EGAL::FactIdent module implements the Identification of Statements of Fact. The methods implemented are

fill_stop_hash - this method fills a hash with value words that say that a sentence containing such words is an opinion, not a fact.

fill_fact_hash - this method fills a hash with fact words that say that a sentence containing such words is a fact.

which are called as a part of the constructor.

isFact - this method takes in a sentence from the students response for a topic/prompt. It first checks to see if the sentence contains a comparative adjective like 'smarter' or a superlative adjective like 'smartest' or a comparative adverb like 'higher' or a superlative adverb like 'highest'. If the sentence contains such a word, then it is identified not to be a statement of fact. Next it is checked to see if the sentence is in any form of future tense. If it is, then the sentence is identified not to be a statement of fact. Then it is checked to see if the sentence is a question and if it is then it is not a statement, let alone a statement of fact. Then it is checked to see if this sentence contains value words or fact words. If it contains such words, then the sentence is identified to be a statement of opinion or statement of fact respectively. Now, that if a sentence contains value words, it makes it a statement of opinion, was given by John Langan[6] in a discussion about distinguishing facts from opinions. Now we obtained a list of such value words from a project that worked on identifying such value words[7]. The list of value words and fact words have been put up on links [21] and [22] respectively. Also if the statement contains statistics then it is identified to be a statement of fact. Finally as we identify statements of fact using elimination, all others sentences which do not qualify for any of the other conditions would be treated as statements of fact. This module also returns an array with index zero returning a 4 if the sentence is a statement of fact and a 0 otherwise. At index 1, the array returns always returns a 0 as there is no concept of a score associated with identifying statements of fact and at index 2 the array returns a message indicating if the sentence is a statement of fact or not. If not, the reason as to why the sentence is not a statement of fact is sent here.

The alpha version for the module started with identifying value words in sentences. Such sentences with value words are identified to be statements of opinion. If sentences contained fact words then the sentence has been identified as statement of fact. Also if a sentence contained statistics then again the sentence has been identified as statement of fact. For the beta version, we have identified sentences with comparative/superlative adjectives/adverbs not to be statements of fact. Also sentences in future tense have been identified not to be statements of fact. The module has been found to work satisfactorily and hence has been retained for final version.

The EGAL::FactCheck module checks the statements of facts for accuracy. The method implemented is

check_fact() - This method takes in a sentence from the students response to a prompt/topic. A linkParser[4] tree for the above sentence is built. Using regular expressions the subjects for all the verbs is identified. A Google query is then constructed to search en.wikipedia.org [3] using the subjects and any proper nouns in the sentence. If there are no proper nouns, all noun phrases are used. The first Google result is taken and full text associated with this Wikipedia entry is retrieved. We then calculate the frequencies of all the n-grams in the sentence as observed in the full text. We have considered n-grams of at most length 3. and applied Witten-Bell smoothing to the obtained frequency list. A confidence measure is calculated for each sentence. The measure takes in some arguments that we shall now see. 'n' refers to the number of words in the n-gram. 'count' refers to the number of times the n-gram has occurred in the wikitext/sentence. 'N' refers to the maximum n-gram length considered which is in our case is 3. Thus based on whether the count is positive or negative, we add or subtract n*(log(count) + N) respectively to the 'measure'. This is done for each of the n-grams in the sentence. Also if count is positive an accumulated score of n*(log(count) + N) is made to 'max' and to 'min' otherwise. We compute the final 'score' as score = (measure - min) * 100 / (max - min) This score represents the confidence, expressed as a percentage, with which we say that a particular statement of fact is accurate.

The module started with the beta version for the project. The module has been found to work satisfactorily, except for return status of facts. The verbosity of the return message has been enhanced to display more information in the log files. With this modification, the module was ready for the final version.


OVERALL PROPOSED SOLUTION

Following is a list of tasks and group member who worked on the same -

Gibberish Detection - Sudip Khanna and Ajit Datar

Irrelevance Measure - Sudip Khanna

Identifying Statements of Fact - Nagendra Doddapaneni

Checking Accuracy of Statements of Fact - Ajit Datar and Sudip Khanna

Web-Interface - Archana Yadav and Ajit Datar

Installation and Administration - Ajit Datar

Documentation - Nagendra Doddapaneni and Varsha Kodali

Literary Review - Nagendra Doddapaneni and Varsha Kodali


As we have seen in the previous section, for Gibberish Detection -
EGAL::Gibberish identified sentences to be semantic gibberish based on
the similarity score given by WordNet::Similarity::wup. Then if the 
percentage gibberishness is more than a particular threshold then the 
sentence is said to be semantic gibberish else EGAL::Gibberish checks 
to see if sentences are syntactic gibberish or not based on the number 
of unused links and unknown words in the sentence. If this count is 
relatively high, then the sentence is considered to be syntactic 
gibberish.

Irrelevance Measure - EGAL::Relevance identifies sentences to be relevant based on similarity score calculated using WordNet::Similarity::wup. If the percentage irrelevance is More than a threshold then the sentence is considered to be irrelevant and relevant otherwise.

Identifying Statements of Fact - EGAL::FactIdent identifies sentences to be statements of fact or not based on the occurrence of fact words and stop words respectively. If value words occur in a sentence from the student's response, then the sentence is considered to be an opinion, because value words are mostly used to express personal view. If fact words such as ``first'' occur in the student's response or a number occurs in the sentence which is not gibberish and is relevant, then the sentence is considered to be a statement of fact. Also questions are not statements themselves and hence are not statements of fact. Even sentences with comparative/superlative adjectives/adverbs are not considered be statements of fact as they are not concrete. Also if the sentence is in any form of future tense then again the sentence is not a statement of fact as it not a proposition.

Checking Statement of Fact for Accuracy - EGAL::FactCheck determines the accuracy of the statements of facts. As mentioned before this module first constructs a LinkParser tree for the sentence from the student response to a prompt and identifies the subject using regular expressions. Then a Google query is constructed and fired to en.wikipedia.org using the subject and proper nouns/noun phrases. Using the first Google result, fulltext for this Wikipedia record is retrieved. Now to check the accuracy of the statement, we develop a score based on the statistics obtained from considering all n-grams from the sentence and applying Witten-Bell smoothing to the frequency of such n-grams as observed in the wikitext. Presently n-grams of length upto 3 are being considered. Thus using this score, we can say that the given statement is a statement of fact with score% confidence.


SCORING METHOD

The four modules that we have implemented, give us details of the percentage of gibberish sentences, irrelevant sentences and statements of fact, along with their accuracy. With this information in hand, to develop a scoring mechanism consistent with as scored by a humam grader, would be difficult. Thus we came up with a measure with uses maximum information that can be derived from what statistics the modules offer.

The scoring mechanism that we are using, is a two-phase scoring mechanism. During the initial phase, we consider two bins of sentences. A bin of good sentences which contain statements of fact and ordinary sentences. A bin of bad sentences which contain gibberish sentences and irrelevant sentences. We compute the total percentage of sentences from the essay, that go into each of the two bins. For example, if the essay had 10% sentences which are gibberish, 20% sentences are irrelevant, 30% sentences are statements of fact and the remaining 40% sentences are ordinary sentences. Then the total percentage of sentences in the good bin is 70%, while the total percentage of sentences in the bad bin is 30%. Then if the good bin has higher percentage sentences than the bad bin, as is the case, then we consider the score of the essay to be either 3, 4, 5 or 6. The higher the percentage of sentences in the good bin, the higher would be the score. For the given example, a score of say 4 is assigned. Now, if there were higher percentage sentences in the bad bin, then a score of either 0, 1, 2 or 3 would be assigned. Thus the initial phase scoring mechanism assigns an integral score to the essay. Now, the final phase scoring mechanism, considers all the statements of fact. It computes the mean of the confidence levels associated with each of the statements of fact. If the mean is greater than 50, then the score assigned by the intial phase scoring mechanism, is incremented by a fraction. If the mean is less than 50, then instead of a score boost, the score would be penalised. By how much, is dependent on the confidence level of facts. For example, there were two statements of fact in the essay. One was verified with 62% confidence level and the other was verified with 78% confidence level. Then the mean of the confidence levels would be (62+78)/2 which is 70%. Now, since this value is greater than the threshold of 50%, the score of 4, assigned by the initial phase scoring mechanism, is incremented by the final phase scoring mechanism. So say, a score boost of 0.6 is given. Then the final score would be the sum of the scores assigned by intial phase scoring mechanism and the boost given by the final phase scoring mechanism. So for the example that we considered, the final score would be 4.0 + 0.6 = 4.6 . Now this final value is rounded off to the nearest half number. So 4.6 is rounded off to 4.5 . In the final phase mechanism, the value for score boost has been identified to be following an exponential i.e. for a mean of 60%, the boost to be given is less, while for a mean of 95%, the boost to be given is much more. Thus an exponential function [((mean-50)*1.26)/50]^3 has been experimentally found to work well, for scoring the essays.


EVALUATION

As a part of evaluation, we have in place modules which check to see if a given sentence is gibberish, if not if its irrelevant and if not if its a statement of fact and if it is then we check for its accuracy. Thus on running the system over the following input -

The following are the sentences that have been successfully identified as gibberish by the EGAL::Gibberish module -

        "Karma art have play dance under carpet." as 57.14% syntactic 
        gibberish.
        "Better after so far observe have peers obviate." as 62.50% syntactic 
        gibberish.
        "Lush yellow bright grey amongst intelligence over scale pendant." 
        as 88.89% syntactic gibberish.
        "There is on the list fish and meat that are eating each other." 
        as 94.87% syntactic gibberish.
        "This is the war of 1969, we intend to discuss in the class tommorow" 
        as 93.73% semantic gibberish.
        "Most lovely ladies are running under the ocean." has not been 
        identified as semantic gibberish.
        "There over red hour item clear jumble read." has not been identified 
        as syntactic gibberish.
        
        If the topic is " A government is a tremendous burden to business, 
        though a necessary one" and the student response had a statement 
        "India played a match with Pakistan yesterday." then the above 
        statement received a score of 84.13%. Finally if there was a
        statement like "A government is a governing body in a country" then it
        is identified as a statement of fact with a score of 85.99 %.
        A statement "Government is a good source of trouble for businesses." is
        not gibberish, relevant and not a statement of fact according to the
        system. The value word 'good' in the sentence made the sentence a 
        statement of opinion and hence was identified not to be a statement 
        of fact.

Thus keeping the scores, assigned by each of the modules ,in view and their identification of sentences, in the respective categories, rightfully, the system has been able to achieve expected results.

Following is a detailed run of the system, with details about some sample essays that the user has given for some prompts used in the system. Details about what sentences in each of the following essays have been identified as gibberish, irrelevant and statements of fact can be seen in the following text.

The following is a response which was graded to be a 3 point essay. Our EGAL system graded the essay to a 3.49 score.

This was for the prompt:

        Leisure time is becoming an increasingly rare commodity, largely 
        because technology has failed to achieve its goal of improving our 
        efficiency in our daily pursuits. In your view, how accurate is the 
        statement above? Use relevant reasons and/or examples from your 
        experience, observations, or reading to support your viewpoint.

The user response is :

        Picture this, a family sitting down for breakfast. The father at the 
        head of the table asking everyone what their agenda is for the day. 
        Suddenly he looks at his watch, then with a frantic look on his face, 
        he lets out a bellowing roar of I'm late. Every one looks at each 
        other and scrambles to get thier belongings for the day. Five minutes 
        later everyone meets at the family vehicle and files in. The car 
        speeds away and everyone is off to their busy filled day.
        
        you would think that with today's technology, the family would be able
        to sit down together and enjoy breakfast without being rushed, but in 
        todays society this is not the case. It seems like the more we are 
        advanced in technology the more we pack into our schedultes 
        eliminatingfree time. We are trained as children to work as hard as we
        can, to advance ourseveles in careers or growth and any relaxation 
        could be viewed as laziness by out parents or peers.
        
        Though we do have the technology which could enable us to live stress 
        free lives, we choose to use it to our benefit, but instead of taking 
        advantage of our newly created "spare time", we bog ourselves with 
        more work. Let's take the father of this family who is a well known 
        executive at a prominant accounting firm. He is the man that solves 
        all the problems and has all the answers for his company. During his 
        lunch hour he sits and calculates numbers instead of enjoying himslef 
        and relaxing. "No time for rest" is his motto. When his boss says 
        we're going to give you a half day today, he decides to spend it on 
        the golf course discussin work. He has no time for his family and 
        always seems to be found in his office when at home. This is a very 
        unhealthy way of live and could be damaging to the raising of his 
        children.
        
        The children pick up patterns at a very young age. Grwoing up we are 
        trained by our parents subcounciously. These children from a very 
        young age are taught that leisure time is wrong. At a young age that 
        children are subjected to little league and ballet, as a detourant of 
        cutting into their parents time. In these activities childrn are 
        pushed to their fullest potential, allowing them to accompish the 
        honor roll, class president, or even valedictorian for there 
        graduating class. It is great that the children have such drive, but 
        without relaxation or leisure time it oculd lead to psychological 
        problems or mental breakdowns.
        
        Even though technology has created free or leisure time, we as 
        individuals need to learn to take advantage of it. We have been 
        trained at a very young age always to be busy. When were not working 
        on deadline or have meeting to be at we are often wondering what do 
        we do with ourselves. The fact of the matter is that we do have the 
        technology to make our lives a lot easier, we just need to take 
        advantage of it, if we don't we could end up seriously injured 
        physically, or even more detrminetal psychologically.

-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------


        Type of Sentences Identified    Percentage
        Gibberish                       7.41
        Irrelevance                     25.93
        Facts                           18.52
        Ordinary                        48.15

        gibberish sentences were as follows

        Sentence  It seems like the more we are advanced in technology the 
        more we pack into our schedultes eliminating free time.
        Points  91.16%
        Semantic gibberish

        Sentence  He is the man that solves all the problems and has all the 
        answers for his company.
        Points  90.30%
        Semantic gibberish

        irrelevant sentences were as follows:

        Sentence  Picture this, a family sitting down for breakfast.
        Points  76.86%

        Sentence  Let's take the father of this family who is a well known 
        executive at a prominant accounting firm.
        Points  76.08%

        Sentence  During his lunch hour he sits and calculates numbers instead
        of enjoying himslef and relaxing.
        Points  70.35%

        Sentence  This is a very unhealthy way of live and could be damaging 
        to the raising of his children.
        Points  70.30%

        Sentence  Grwoing up we are trained by our parents subcounciously.
        Points  83.51%

        Sentence  We have been trained at a very young age always to be busy.
        Points  76.97%

        Sentence  When were not working on deadline or have meeting to be at 
        we are often wondering what do we do with ourselves.
        Points  74.89%

        fact sentences were as follows:

        Sentence  The father at the head of the table asking everyone what 
        their agenda is for the day.
        Points  61.44%
        I can say this is a fact with 61.44 percent confidence

        Sentence  Five minutes later everyone meets at the family vehicle and 
        files in.
        Points  0.00%
        I can say this is a fact with 0.00 percent confidence

        Sentence  The car speeds away and everyone is off to their busy filled
        day.
        Points  41.31%
        I can say this is a fact with 41.31 percent confidence

        Sentence  "No time for rest" is his motto.
        Points  62.20%
        I can say this is a fact with 62.20 percent confidence

        Sentence  The children pick up patterns at a very young age.
        Points  27.00%
        I can say this is a fact with 27.00 percent confidence

        ordinary sentences were as follows:

        Sentence  Suddenly he looks at his watch, then with a frantic look on 
        his face, he lets out a bellowing roar of I'm late.
        Message  This sentence is a statement of fact

        Nouns: "face" "roar" "watch" "look"
                Subjects: "he """
                Google query: "he """ "face" "roar" "watch" "look" 
                        site:en.wikipedia.org
                System cannot verify this using the existing knowledge source.

        Sentence  Every one looks at each other and scrambles to get thier 
        belongings for the day.
        Message  This sentence is a statement of fact

        Nouns: "scrambles" "one" "thier" "belongings" "day"
                Subjects: "scrambles """
                Google query: "scrambles """ "scrambles" "one" "thier" 
                        "belongings" "day" site:en.wikipedia.org
                System cannot verify this using the existing knowledge source.

        Sentence  you would think that with today's technology, the family 
        would be able to sit down together and enjoy breakfast without being 
        rushed, but in todays society this is not the case.
        Message  This sentence is in future tense

        Sentence  We are trained as children to work as hard as we can, to 
        advance ourseveles in careers or growth and any relaxation could be 
        viewed as laziness by out parents or peers.
        Message  This sentence is in future tense

        Sentence  Though we do have the technology which could enable us to 
        live stress free lives, we choose to use it to our benefit, but 
        instead of taking advantage of our newly created "spare time", we bog 
        ourselves with more work.
        Message  This sentence has comparitive/superlative adverbs/adjectives

        Sentence  When his boss says we're going to give you a half day today,
        he decides to spend it on the golf course discussin work.
        Message  This sentence is in future tense

        Sentence  He has no time for his family and always seems to be found 
        in his office when at home.
        Message  This sentence contains the value word home

        Sentence  These children from a very young age are taught that leisure
        time is wrong.
        Message  This sentence contains the value word leisure

        Sentence  At a young age that children are subjected to little league 
        and ballet, as a detourant of cutting into their parents time.
        Message  This sentence contains the value word little

        Sentence  In these activities childrn are pushed to their fullest 
        potential, allowing them to accompish the honor roll, class president,
        or even valedictorian for there graduating class.
        Message  This sentence has comparitive/superlative adverbs/adjectives

        Sentence  It is great that the children have such drive, but without 
        relaxation or leisure time it oculd lead to psychological problems or 
        mental breakdowns.
        Message  This sentence contains the value word great

        Sentence  Even though technology has created free or leisure time, we 
        as individuals need to learn to take advantage of it.
        Message  This sentence contains the value word leisure

        Sentence  The fact of the matter is that we do have the technology to 
        make our lives a lot easier, we just need to take advantage of it, if 
        we don't we could end up seriously injured physically, or even more 
        detrminetal psychologically.
        Message  This sentence has comparitive/superlative adverbs/adjectives

The next essay was evaluated to be a 4 pointer, and it was evaluated to be a 2.61.

The essay topic was

        People often complain that products are not made to last. They feel 
        that making products that wear out fairly quickly wastes both natural 
        and human resources. What they fail to see, however, is that such 
        manufacturing practices keep costs down for the consumer and stimulate
        demand. Which do you find more compelling: the complaint about 
        products that do not last or the response to it? Explain your position
        using relevant reasons and/or examples drawn from your own experience,
        observations, or reading.
        
The user response was
        The topic raises the issue of whether, on balance, consumers are 
        damaged or benefited by quality-cutting production methods. 
        Indisputably, many consumer products today are not made to last. 
        Nevertheless, consumers themselves sanction this practice, and they 
        are its ultimate beneficiaries in terms of lower prices, more choices,
        and a stronger economy.
        
        Common sense tells us that sacrificing quality results in a net 
        benefit to consumers and to overall economy. Cutting production 
        corners not only allows a business to reduce a product's retail price,
        it compels the business to do so, since its competitors will find 
        innovative ways of capturing its market share otherwise. Lower prices 
        stimulates sales, which in turn generate healthy economic activity. 
        Observation also strongly supports this claim. One need only look at 
        successful budget retail stores such as Walmart as evidence that many 
        and perhaps most consumers indeed tend to value price over quality.
        
        Do low-quality products waste natural resources? On balance, probably 
        not. Admittedly, to the extent that a product wears out sooner, more 
        material are needed for replacement units. Yet cheaper materials are 
        often synthetics, which conserve natural resources, as in the case of 
        synthetic clothing, dyes and inks, and wood substitutes and composites.
        Moreover, many synthetics and composites are now actually safer and 
        more durable than their natural counterparts especially in the area 
        of construction materials.
        
        Do lower-quality products waste human resources? If by waste we mean 
        use up unnecessarily, the answer is no. Many lower-quality products 
        are machine-made ones that conserve, not waste, human labor for 
        example, machine-stitched or dyed clothing and machine-tooled 
        furniture. Moreover, other machine-made products are actually higher 
        in quality than their man-made counterparts, such as those requiring 
        a precision and consistency that only machines can provide. Finally, 
        many cheaply-made products are manufactured and assembled by the 
        lower-cost Asian and Central American labor force a legion for whom 
        the alternative is unemployment and poverty. In these cases, producing
        lower-quality products does not waste human resources; to the contrary,
        it creates productive jobs.
        
        In the final analysis, cost-cutting production methods benefit 
        consumers, both in the short-term through lower prices and in the 
        long run by way of economic vitality and increased competition. The 
        claim that producing lower-quality product wastes natural and human 
        resources is specious at best.

-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------


        Type of Sentences Identified    Percentage
        Gibberish       19.05
        Irrelevance     9.52
        Facts   14.29
        Ordinary        57.14

        ------------------------------

        Percentage of Gibberish Sentences were: 7.41
        Percentage of Irrelevant Sentences were: 25.93
        Percentage of Fact Sentences were: 18.52
        Percentage of Ordinary Sentences were: 48.15
        No. of Gibberish Sentences were: 2
        No. of Irrelvant Sentences were: 7
        No. of Facts Sentences were: 5
        No. of Ordinary Sentences were: 13

        gibberish sentences were as follows

        Sentence  It seems like the more we are advanced in technology the 
        more we pack into our schedultes eliminating free time.
        Points  91.16%
        Semantic gibberish

        Sentence  He is the man that solves all the problems and has all the 
        answers for his company.
        Points  90.30%
        Semantic gibberish

        irrelevant sentences were as follows:

        Sentence  Picture this, a family sitting down for breakfast.
        Points  76.86%

        Sentence  Let's take the father of this family who is a well known 
        executive at a prominant accounting firm.
        Points  76.08%

        Sentence  During his lunch hour he sits and calculates numbers instead
        of enjoying himslef and relaxing.
        Points  70.35%

        Sentence  This is a very unhealthy way of live and could be damaging 
        to the raising of his children.
        Points  70.30%

        Sentence  Grwoing up we are trained by our parents subcounciously.
        Points  83.51%

        Sentence  We have been trained at a very young age always to be busy.
        Points  76.97%

        Sentence  When were not working on deadline or have meeting to be at 
        we are often wondering what do we do with ourselves.
        Points  74.89%

        fact sentences were as follows:

        Sentence  The father at the head of the table asking everyone what 
        their agenda is for the day.
        Points  61.44%
        Message  This sentence is a statement of fact

        Nouns: "head" "father" "table" "asking" "day" "everyone" "agenda"
                Subjects: "father "
                Google query: "father " "head" "father" "table" "asking" "day"
                         "everyone" "agenda" site:en.wikipedia.org
                Looking at page: Talk:George W. Bush
                2-gram "what their" : 0
                3-gram "is for the" : 0
                3-gram "everyone what their" : 0
                2-gram "is for" : 3
                1-gram "is" : 765
                3-gram "agenda is for" : 0
                2-gram "their agenda" : 0
                2-gram "agenda is" : 0
                3-gram "what their agenda" : 0
                1-gram "their" : 26
                3-gram "their agenda is" : 0
                Measure: 8.97019694918424 Min: -15.125 Max: 24.0951969491842
                Final score: 61.4356857524281
                I can say this is a fact with 61.44 percent confidence

        Sentence  Five minutes later everyone meets at the family vehicle and 
        files in.
        Points  0.00%
        Message  This sentence is a statement of fact

        Nouns: "files" "minutes" "vehicle" "everyone" "family"
                Subjects: "everyone "
                Google query: "everyone " "files" "minutes" "vehicle" 
                        "everyone" "family" site:en.wikipedia.org
                Looking at page: OJ Simpson
                1-gram "five" : 0
                3-gram "minutes later everyone" : 0
                3-gram "five minutes later" : 0
                1-gram "later" : 0
                2-gram "five minutes" : 0
                3-gram "later everyone meets" : 0
                2-gram "minutes later" : 0
                1-gram "meets" : 0
                2-gram "meets at" : 0
                2-gram "later everyone" : 0
                2-gram "everyone meets" : 0
                3-gram "meets at the" : 0
                3-gram "everyone meets at" : 0
                Measure: -24 Min: -24 Max: 0
                Final score: 0
                I can say this is a fact with 0.00 percent confidence

        Sentence  The car speeds away and everyone is off to their busy filled
        day.
        Points  41.31%
        Message  This sentence is a statement of fact

        Nouns: "car" "day" "everyone" "speeds"
                Subjects: """everyone "
                Google query: """everyone " "car" "day" "everyone" "speeds" 
                        site:en.wikipedia.org
                Looking at page: User talk:Arpingstone
                2-gram "everyone is" : 0
                3-gram "everyone is off" : 0
                3-gram "their busy filled" : 0
                3-gram "and everyone is" : 0
                1-gram "is" : 229
                2-gram "speeds away" : 0
                2-gram "away and" : 0
                3-gram "to their busy" : 0
                3-gram "busy filled day" : 0
                2-gram "filled day" : 0
                3-gram "off to their" : 0
                3-gram "is off to" : 0
                2-gram "is off" : 0
                2-gram "busy filled" : 0
                3-gram "speeds away and" : 0
                2-gram "their busy" : 0
                1-gram "their" : 14
                2-gram "to their" : 2
                3-gram "car speeds away" : 0
                1-gram "away" : 0
                3-gram "away and everyone" : 0
                1-gram "busy" : 3
                1-gram "filled" : 0
                Measure: -10.7581034907267 Min: -36.3157894736842 
                        Max: 25.5576859829575
                Final score: 41.306368834683
                I can say this is a fact with 41.31 percent confidence

        Sentence  "No time for rest" is his motto.
        Points  62.20%
        Message  This sentence is a statement of fact

        Nouns: "rest" "time" "motto"
                Subjects: ""
                Google query: "" "rest" "time" "motto" site:en.wikipedia.org
                Looking at page: Samson Raphael Hirsch
                3-gram "is his motto" : 0
                1-gram "no" : 13
                2-gram "is his" : 1
                1-gram "is" : 178
                2-gram "rest is" : 0
                3-gram "rest is his" : 0
                3-gram "for rest is" : 0
                3-gram "no time for" : 0
                2-gram "no time" : 0
                Measure: 7.74673290775362 Min: -12 Max: 19.7467329077536
                Final score: 62.2008348548231
                I can say this is a fact with 62.20 percent confidence

        Sentence  The children pick up patterns at a very young age.
        Points  27.00%
        Message  This sentence is a statement of fact

        Nouns: "patterns" "children" "age"
                Subjects: ""
                Google query: "" "patterns" "children" "age" 
                        site:en.wikipedia.org
                Looking at page: Language acquisition
                2-gram "a very" : 0
                1-gram "young" : 3
                2-gram "children pick" : 0
                3-gram "a very young" : 0
                3-gram "very young age" : 0
                2-gram "very young" : 0
                3-gram "the children pick" : 0
                2-gram "young age" : 0
                1-gram "very" : 3
                3-gram "at a very" : 0
                1-gram "pick" : 0
                2-gram "pick up" : 0
                3-gram "children pick up" : 0
                3-gram "pick up patterns" : 0
                Measure: -13.9694420893304 Min: -22.1666666666667 
                        Max: 8.19722457733622
                Final score: 26.9966207936384
                I can say this is a fact with 27.00 percent confidence

        ordinary sentences were as follows:

        Sentence  Suddenly he looks at his watch, then with a frantic look on 
        his face, he lets out a bellowing roar of I'm late.
        Message  This sentence is a statement of fact

        Nouns: "face" "roar" "watch" "look"
                Subjects: "he """
                Google query: "he """ "face" "roar" "watch" "look" 
                        site:en.wikipedia.org
                System cannot verify this using the existing knowledge source.

        Sentence  Every one looks at each other and scrambles to get thier 
        belongings for the day.
        Message  This sentence is a statement of fact

        Nouns: "scrambles" "one" "thier" "belongings" "day"
                Subjects: "scrambles """
                Google query: "scrambles """ "scrambles" "one" "thier" 
                        "belongings" "day" site:en.wikipedia.org
                System cannot verify this using the existing knowledge source.

        Sentence  you would think that with today's technology, the family 
        would be able to sit down together and enjoy breakfast without being 
        rushed, but in todays society this is not the case.
        Message  This sentence is in future tense

        Sentence  We are trained as children to work as hard as we can, to 
        advance ourseveles in careers or growth and any relaxation could be 
        viewed as laziness by out parents or peers.
        Message  This sentence is in future tense

        Sentence  Though we do have the technology which could enable us to 
        live stress free lives, we choose to use it to our benefit, but 
        instead of taking advantage of our newly created "spare time", we bog 
        ourselves with more work.
        Message  This sentence has comparitive/superlative adverbs/adjectives

        Sentence  When his boss says we're going to give you a half day today,
        he decides to spend it on the golf course discussin work.
        Message  This sentence is in future tense

        Sentence  He has no time for his family and always seems to be found 
        in his office when at home.
        Message  This sentence contains the value word home

        Sentence  These children from a very young age are taught that leisure
        time is wrong.
        Message  This sentence contains the value word leisure

        Sentence  At a young age that children are subjected to little league 
        and ballet, as a detourant of cutting into their parents time.
        Message  This sentence contains the value word little

        Sentence  In these activities childrn are pushed to their fullest 
        potential, allowing them to accompish the honor roll, class president,
        or even valedictorian for there graduating class.
        Message  This sentence has comparitive/superlative adverbs/adjectives

        Sentence  It is great that the children have such drive, but without 
        relaxation or leisure time it oculd lead to psychological problems or 
        mental breakdowns.
        Message  This sentence contains the value word great

        Sentence  Even though technology has created free or leisure time, we 
as individuals need to learn to take advantage of it.
        Message  This sentence contains the value word leisure

        Sentence  The fact of the matter is that we do have the technology to 
        make our lives a lot easier, we just need to take advantage of it, if 
        we don't we could end up seriously injured physically, or even more 
        detrminetal psychologically.
        Message  This sentence has comparitive/superlative adverbs/adjectives

For a third essay, the EGAL system has graded the system to be a 4.0 pointer, where the essay was graded to be a 5.0 pointer.

The topic of the essay was :

        In some countries, television and radio programs are carefully 
        censored for offensive language and behavior. In other countries, 
        there is little or no censorship. In your view, to what extent should 
        government or any other group be able to censor television or radio 
        programs? Explain, giving relevant reasons and/or examples to support 
        your position.

The user response was :

        I beg to differ with the speaker's contention which seems to imply 
        that the goal of technology is not only to increase effciency but also
        our leisure time. Also interwoven in the speaker's statement is the 
        fallacious assumption that they are connected. So we have three points
        which need to be considered - technological advances, efficiency & 
        leisure - and how they are related.
        The aim of technological advance (progress in applied sciences), as 
        far as I know, is to apply scientific data and discoveries toward 
        practical and beneficial use. For instance we've used new knowledge of
        Particle Physics in diagnosing medical conditions - eg. through 
        Magneto Resonance Imagery - and also in treatment - eg., radiotherapy.
        Did this technological advance and the motivation behind it really
        have anything to do with efficiency? Only in that efficiency might be 
        a by-product of a certain technology , but I do not think it was the 
        primary objective.
        
        Of course the by-product of certain new technologies might be 
        "efficiency" but to what extent? Computers are typically cited as a 
        perfect example. Yes they do help us get more work done without 
        expending as much energy. But we need to factor in the time and energy
        required in learning how to efficiently operate one, and then expended
        in keeping our learning up to date with the rapid technological 
        advances in the same. (A person with the energy to compile and 
        critically analyze the data constructively to formulate the answer to 
        that one will definitely need an advanced computer!) So its possible 
        that even computers don't in the end improve the efficiency of our 
        daily lives, in net terms.
        
        And then, there is the question of "leisure". Personally I think it is
         a matter of choice and not time saving ,technologically advanced, 
        efficient tools. The speaker seems to assume that the time "saved" (we
         are still waiting for the verdict on that one) will be spent towards 
        leisure. I do not see the connection. Ulitmately the motivation of a 
        person, personality & lifestyle choices and circumstances determine 
        how the time that is saved is used. It could be towards leisure in one
         person's case; in another's towards putting in more hours to make 
        more money to make ends meet or to buy that new car which he/she 
        absolutely must have.
        
        In the end I think there is no clear connection between the three 
        points under consideration. Hence in the absence of the relationship 
        between technology, efficiency & leisure claimed by the speaker I 
        disagree on whole.
        
        Percentage of Gibberish Sentences were: 13.64
        Percentage of Irrelevant Sentences were: 40.91
        Percentage of Fact Sentences were: 0.00
        Percentage of Ordinary Sentences were: 45.45
        No. of Gibberish Sentences were: 3
        No. of Irrelvant Sentences were: 9
        No. of Facts Sentences were: 0
        No. of Ordinary Sentences were: 10
        
        gibberish sentences were as follows
        
        Sentence  I beg to differ with the speaker's contention which seems to
        simply that the goal of technology is not only to increase effciency 
        but also our leisure time.
        Points  90.07%
        Semantic gibberish
        
        Sentence  Computers are typically cited as a perfect example.
        Points  94.87%
        Semantic gibberish
        
        Sentence  So its possible that even computers don't in the end improve
         the efficiency of our daily lives, in net terms.
        Points  90.68%
        Semantic gibberish
        
        irrelevant sentences were as follows:
        
        Sentence  Also interwoven in the speaker's statement is the fallacious
         assumption that they are connected.
        Points  71.23%
        
        Sentence  So we have three points which need to be considered - 
        technological advances, efficiency & leisure - and how they are 
        related.
        Points  75.51%
        
        Sentence  For instance we've used new knowledge of Particle Physics in
         diagnosing medical conditions - eg.
        Points  70.60%
        
        Sentence  Did this technological advance and the motivation behind it 
        really have anything to do with efficiency?
        Points  73.89%
        
        Sentence  But we need to factor in the time and energy required in 
        learning how to efficiently operate one, and then expended in keeping 
        our learning up to date with the rapid technological advances in the 
        same.
        Points  76.65%
        
        Sentence  And then, there is the question of "leisure".
        Points  76.20%
        
        Sentence  The speaker seems to assume that the time "saved" (we are 
        still waiting for the verdict on that one) will be spent towards 
        leisure.
        Points  70.48%
        
        Sentence  I do not see the connection.
        Points  76.59%
        
        Sentence  In the end I think there is no clear connection between the 
        three points under consideration.
        Points  73.67%
        
        no fact sentences were found
        
        ordinary sentences were as follows:
        
        Sentence  The aim of technological advance (progress in applied 
        sciences), as far as I know, is to apply scientific data and 
        discoveries toward practical and beneficial use.
        Message  This sentence contains the value word progress
        
        Sentence  through Magneto Resonance Imagery - and also in treatment - 
        eg., radiotherapy.
        Message  This sentence is a statement of fact
                
        Nouns: "magneto"
                Subjects: "Magneto Resonance Imagery "
                Google query: "Magneto Resonance Imagery " "magneto" 
                        site:en.wikipedia.org
                System cannot verify this using the existing knowledge source.
                
        
        Sentence  Only in that efficiency might be a by-product of a certain 
        technology , but I do not think it was the primary objective.
        Message  This sentence is in future tense
                
        
        Sentence  Of course the by-product of certain new technologies might 
        be"efficiency" but to what extent?
        Message  This sentence is in future tense
                
        
        Sentence  Yes they do help us get more work done without expending as 
        much energy.
        Message  This sentence has comparitive/superlative adverbs/adjectives
                
        
        Sentence  (A person with the energy to compile and critically analyze 
        the data constructively to formulate the answer to that one will 
        definitely need an advanced computer!)
        Message  This sentence is in future tense
                
        
        Sentence  Personally I think it is a matter of choice and not time 
        saving ,technologically advanced, efficient tools.
        Message  This sentence contains the value word choice
        
        Sentence  Ulitmately the motivation of a person, personality & 
        lifestyle choices and circumstances determine how the time that is 
        saved is used.
        Message  This sentence contains the value word motivation
        
        Sentence  It could be towards leisure in one person's case; in 
        another's towards putting in more hours to make more money to make 
        ends meet or to buy that new car which he/she absolutely must have.
        Message  This sentence has comparitive/superlative adverbs/adjectives
                
        
        Sentence  Hence in the absence of the relationship between technology,
         efficiency & leisure claimed by the speaker I disagree on whole.
        Message  This sentence contains the value word efficiency

The fourth essay is a combination of two 3-point essays written by our co-students for a prompt given commonly in class. The EGAL system graded this grouped essay to be a 3.62

The topic was
Automated essay scoring is unfair to students, since there are many
different ways for a student to express ideas intelligently and
coherently. A computer program can not be expected to anticipate all
of these possibilities, and will therefore grade students more harshly
than they deserve. Discuss whether you agree or disagree (partially
or totally) with the view expressed providing reasons and examples.

The student response was

        Many students write essays for exam, some write good, some bad, some 
        worst. The compter cannot get this idea of good, bad, and worst. It 
        just tries to find information the ways it is supposed to and scores 
        that way. Thts why automated essay scoring is unfair to students.
        
        For examples, many student know that the computer checks the essays, 
        therefore they learn how to ebat the machine rather then being 
        creative to write their essays and learn something. In this case the 
        computer would just be too good to score and give good score to 
        someone who cant even write a creative sentence nor can write 
        grammatical correct sentece.
        
        Another point here we can see is that the computer learns from essays 
        written by students, now if the sample essays are not that creative 
        then if someone writes creative essay it will just get confused and 
        give some random score, if not then it will give the best possible 
        score which is still not correct. So is it fair to get garbage or 
        grate score from machine?
        
        Last but not least, computer scoring ethically is not good.
        
        To conclude I would say stop this computer scoring, it doesnt make 
        sense when you write an essay thinking hard and someobody who is not 
        real grades it for you and gives you something that you dont 
        confidence about.
        
        Automated essay scoring is used in GMAT and GRE. It uses computers to 
        grade student essays. Using computers makes work faster. It also 
        reduces the work load on human graders.
        
        First of all, the difference between human graded essay and computer 
        graded essay is not very significant. Both tend to be the same. If a 
        good computer is used this difference can be reduced.
        
        Even though each student has his own views the computer is intelligent
         enough to catch the differences and grade accordingly. As a student 
        writes an essay the computer can understand how the student writes and
         this will help it in grading.
        
        Thus, the computer does not grade an essay harshly than humans. It 
        just has some difference but not to a greater extent and such 
        deviations can be ignored as even humans make mistakes.
        
        Finally, I feel that the usage of automated essay scoring is good and 
        should be followed by all including schools and universities. It will 
        help students and teachers a lot.

-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: -------------------------------------------------- Type of Sentences Identified Percentage Gibberish 17.39 Irrelevance 17.39 Facts 26.09 Ordinary 39.13 Percentage of Gibberish Sentences were: 17.39 Percentage of Irrelevant Sentences were: 17.39 Percentage of Fact Sentences were: 26.09 Percentage of Ordinary Sentences were: 39.13 No. of Gibberish Sentences were: 4 No. of Irrelvant Sentences were: 4 No. of Facts Sentences were: 6 No. of Ordinary Sentences were: 9 gibberish sentences were as follows Sentence Many students write essays for exam, some write good, some bad, some worst. Points 91.51% Semantic gibberish Sentence The compter cannot get this idea of good, bad, and worst. Points 63.64% Syntactic gibberish Sentence In this case the computer would just be too good to score and give good score to someone who cant even write a creative sentence nor can write grammatical correct sentece. Points 92.21 Semantic gibberish Sentence Using computers makes work faster. Points 94.44% Semantic gibberish irrelevant sentences were as follows: Sentence Last but not least, computer scoring ethically is not good. Points 75.51% Sentence Automated essay scoring is used in GMAT and GRE. Points 72.83% Sentence If a good computer is used this difference can be reduced. Points 70.31% Sentence It just has some difference but not to a greater extent and such deviations can be ignored as even humans make mistakes. Points 75.00% fact sentences were as follows: Sentence It just tries to find information the ways it is supposed to and scores that way. Points 40.23% I can say this is a fact with 40.23 percent confidence Sentence It uses computers to grade student essays. Points 0.00% I can say this is a fact with 0.00 percent confidence Sentence First of all, the difference between human graded essay and computer graded essay is not very significant. Points 22.31% I can say this is a fact with 22.31 percent confidence Sentence Both tend to be the same. Points 14.48% I can say this is a fact with 14.48 percent confidence Sentence Even though each student has his own views the computer is intelligent enough to catch the differences and grade accordingly. Points 37.05% I can say this is a fact with 37.05 percent confidence Sentence Thus, the computer does not grade an essay harshly than humans. Points 13.05% I can say this is a fact with 13.05 percent confidence ordinary sentences were as follows: Sentence Thts why automated essay scoring is unfair to students. Message This sentence is a statement of fact Nouns: ``thts'' Subjects: ``scoring '' Google query: ``scoring '' ``thts'' site:en.wikipedia.org System cannot verify this using the existing knowledge source. Sentence For examples, many student know that the computer checks the essays, therefore they learn how to ebat the machine rather then being creative to write their essays and learn something. Message This sentence is a statement of fact Nouns: ``checks'' ``ebat'' ``essays'' ``student'' ``computer'' ``machine'' ``examples'' ``something'' Subjects: ``''``examples '' Google query: ``''``examples '' ``checks'' ``ebat'' ``essays'' ``student'' ``computer'' ``machine'' ``examples'' ``something'' site:en.wikipedia.org System cannot verify this using the existing knowledge source. Sentence Another point here we can see is that the computer learns from essays written by students, now if the sample essays are not that creative then if someone writes creative essay it will just get confused and give some random score, if not then it will give the best possible score which is still not correct. Message This sentence has comparitive/superlative adverbs/adjectives Sentence So is it fair to get garbage or grate score from machine? Message This sentence is a statement of fact Nouns: ``score'' ``grate'' ``garbage'' ``machine'' Subjects: Google query: ``score'' ``grate'' ``garbage'' ``machine'' site:en.wikipedia.org System cannot verify this using the existing knowledge source. Sentence To conclude I would say stop this computer scoring, it doesnt make sense when you write an essay thinking hard and someobody who is not real grades it for you and gives you something that you dont confidence about. Message This sentence is in future tense Sentence It also reduces the work load on human graders. Message This sentence contains the value word work Sentence As a student writes an essay the computer can understand how the student writes and this will help it in grading. Message This sentence is in future tense Sentence Finally, I feel that the usage of automated essay scoring is good and should be followed by all including schools and universities. Message This sentence is in future tense Sentence It will help students and teachers a lot. Message This sentence is in future tense

The fifth essay sample that we are looking at is the topic/prompt given as the response for a topic/prompt. The EGAL system has evaluated such a tricky response to a score of 1.96

The topic/prompt was

        People often complain that products are not made to last. They feel 
        that making products that wear out fairly quickly wastes both natural 
        and human resources. What they fail to see, however, is that such 
        manufacturing practices keep costs down for the consumer and stimulate
         demand. Which do you find more compelling: the complaint about 
        products that do not last or the response to it? Explain your position
         using relevant reasons and/or examples drawn from your own 
        experience, observations, or reading.

The response was entered to be

        People often complain that products are not made to last. They feel 
        that making products that wear out fairly quickly wastes both natural 
        and human resources. What they fail to see, however, is that such 
        manufacturing practices keep costs down for the consumer and stimulate
         demand. Which do you find more compelling: the complaint about 
        products that do not last or the response to it? Explain your position
         using relevant reasons and/or examples drawn from your own 
        experience, observations, or reading.

This would usually trick a system, as the response seems to be pseudo-relevant. But here is the analysis of how EGAL handles the same-

-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------

        Type of Sentences Identified    Percentage
        Gibberish       40.00
        Irrelevance     20.00
        Facts           20.00
        Ordinary        20.00
        
        Percentage of Gibberish Sentences were: 40.00
        Percentage of Irrelevant Sentences were: 20.00
        Percentage of Fact Sentences were: 20.00
        Percentage of Ordinary Sentences were: 20.00
        No. of Gibberish Sentences were: 2
        No. of Irrelvant Sentences were: 1
        No. of Facts Sentences were: 1
        No. of Ordinary Sentences were: 1
        
        gibberish sentences were as follows
        
        Sentence  People often complain that products are not made to last.
        Points  92.59%
        Semantic gibberish
        
        Sentence  They feel that making products that wear out fairly quickly 
        wastes both natural and human resources.
        Points  90.64%
        Semantic gibberish
        
        irrelevant sentences were as follows:
        
        Sentence  Explain your position using relevant reasons and/or examples
         drawn from your own experience, observations, or reading.
        Points  73.18%
        
        fact sentences were as follows:
        
        Sentence  What they fail to see, however, is that such manufacturing 
        practices keep costs down for the consumer and stimulate demand.
        Points  36.74%
        I can say this is a fact with 36.74 percent confidence
                
        
        ordinary sentences were as follows:
        
        Sentence  Which do you find more compelling: the complaint about 
        products that do not last or the response to it?
        Message  This sentence has comparitive/superlative adverbs/adjectives
        

Now for the sixth essay, we considered a 6 point esssay which was graded as 4.99 on our EGAL system. The essay topic was :

        Automated essay scoring is unfair to students, since there are many 
        different ways for a student to express ideas intelligently and 
        coherently. A computer program can not be expected to anticipate all 
        of these possibilities, and will therefore grade students more harshly
         than they deserve. Discuss whether you agree or disagree (partially 
        or totally) with the view expressed providing reasons and examples.

The user response was:

        I strongly disagree with the argument that automated essay scoring is 
        unfair to students. The automated essay scoring systems in use are 
        carefully designed by natural language processing experts. In fact, 
        they are proven to be comparable, if not better, to a human grader. 
        Automated essay scoring systems might grasp the nuances of every witty
         writing, but it certainly does well, the task for which it is 
        assigned� namely grading of analytical writing essays.
        
        Argument claims that a student can express an idea in ways not known 
        to the automated system thus resulting in a poor score. To refute this
         claim I must point out the fact that there are many NLP techniques 
        which look at the general characterisitcs of a good essay, rather than
         a particular way, to decide a score. Even the most ingeniously 
        different essay follows the guidelines of a good essay, otherwise it 
        will not be able to represent the idea coherently.
        
        Second, the issue of harshness of the automated system is really 
        irrelevant. However harsh a system might be, as long as it is the 
        common denominator for all the essays, the relative scores are still 
        the same. Therefore there is no unfair harshness here. In any case, 
        automated grading is proven to be as harsh as a human grader.
        
        Third, we must not ignore the benefits of the automated essay scoring.
         It is cost effective� it is half as expensive as a human grader. 
        Like all machine-based approach, it does not suffer from errors due to
         fatigue and mental state. There is no chance that the system is 
        biased towards any particular student. The The speed of such a system 
        will only increse as more processing power is added and new techniques
         are developed.
        
        In conclusion, I would like to say that automated essay scoring are a 
        very fair way of scoring, if implemented correctly. However, there is,
         and always should be, atleast one human grader in the scoring process
         to take care of the anamolies that might arise in certain rare cases.

-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------

        Type of Sentences Identified    Percentage
        Gibberish       11.11
        Irrelevance     11.11
        Facts   16.67
        Ordinary        61.11
        
        Percentage of Gibberish Sentences were: 11.11
        Percentage of Irrelevant Sentences were: 11.11
        Percentage of Fact Sentences were: 16.67
        Percentage of Ordinary Sentences were: 61.11
        No. of Gibberish Sentences were: 2
        No. of Irrelvant Sentences were: 2
        No. of Facts Sentences were: 3
        No. of Ordinary Sentences were: 11
        
        gibberish sentences were as follows
        
        Sentence  Third, we must not ignore the benefits of the automated 
        essay scoring.
        Points  92.86%
        Semantic gibberish
        
        Sentence  There is no chance that the system is biased towards any 
        particular student.
        Points  92.59%
        Semantic gibberish
        
        irrelevant sentences were as follows:
        
        Sentence  Therefore there is no unfair harshness here.
        Points  82.83%
        
        Sentence  In any case, automated grading is proven to be as harsh as 
        a human grader.
        Points  79.08%
        
        fact sentences were as follows:
        
        Sentence  I strongly disagree with the argument that automated essay 
        scoring is unfair to students.
        Points  37.19%
        I can say this is a fact with 37.19 percent confidence
                
        Sentence  The automated essay scoring systems in use are carefully 
        designed by natural language processing experts.
        Points  40.31%
        I can say this is a fact with 40.31 percent confidence
                
        Sentence  In fact, they are proven to be comparable, if not better, 
        to a human grader.
        Points  49.13%
        I can say this is a fact with 49.13 percent confidence
                
        
        ordinary sentences were as follows:
        Sentence  Automated essay scoring systems might grasp the nuances of 
        every witty writing, but it certainly does well, the task for which it
         is assigned� namely grading of analytical writing essays.
        Message  This sentence is in future tense
                
        
        Sentence  Argument claims that a student can express an idea in ways 
        not known to the automated system thus resulting in a poor score.
        Message  This sentence is in future tense
                
        
        Sentence  To refute this claim I must point out the fact that there 
        are many NLP techniques which look at the general characterisitcs of a
         good essay, rather than a particular way, to decide a score.
        Message  This sentence is in future tense
                
        
        Sentence  Even the most ingeniously different essay follows the 
        guidelines of a good essay, otherwise it will not be able to represent
         the idea coherently.
        Message  This sentence has comparitive/superlative adverbs/adjectives
                
        
        Sentence  Second, the issue of harshness of the automated system is 
        really irrelevant.
        Message  This sentence contains the value word harshness
        
        Sentence  However harsh a system might be, as long as it is the common
         denominator for all the essays, the relative scores are still the 
        same.
        Message  This sentence is in future tense
                
        
        Sentence  It is cost effective� it is half as expensive as a human 
        grader.
        Message  This sentence contains the value word expensive
        
        Sentence  Like all machine-based approach, it does not suffer from 
        errors due to fatigue and mental state.
        Message  This sentence contains the value word fatigue
        
        Sentence  The The speed of such a system will only increse as more 
        processing power is added and new techniques are developed.
        Message  This sentence has comparitive/superlative adverbs/adjectives
                
        
        Sentence  In conclusion, I would like to say that automated essay 
        scoring are a very fair way of scoring, if implemented correctly.
        Message  This sentence is in future tense
                
        
        Sentence  However, there is, and always should be, atleast one human 
        grader in the scoring process to take care of the anamolies that might
         arise in certain rare cases.
        Message  This sentence is in future tense
        

Now for the seventh essay, we considered the same 6-point essay for the trvious topic, and evaluated it for a different topic. The EGAL system graded the essay to be a 2.99 score essay.

The topic was :

        In some countries, television and radio programs are carefully 
        censored for offensive language and behavior. In other countries, 
        there is little or no censorship. In your view, to what extent should 
        government or any other group be able to censor television or radio 
        programs? Explain, giving relevant reasons and/or examples to support 
        your position.

The response was the same as the one for the last one.

-------------------------------------------------- DETAILED DIAGNOSTICS OF THE ESSAY: --------------------------------------------------

        Type of Sentences Identified    Percentage
        Gibberish       11.11
        Irrelevance     38.89
        Facts   5.56
        Ordinary        44.44
        
        Percentage of Gibberish Sentences were: 11.11
        Percentage of Irrelevant Sentences were: 38.89
        Percentage of Fact Sentences were: 5.56
        Percentage of Ordinary Sentences were: 44.44
        No. of Gibberish Sentences were: 2
        No. of Irrelvant Sentences were: 7
        No. of Facts Sentences were: 1
        No. of Ordinary Sentences were: 8
        
        gibberish sentences were as follows
        Sentence  Third, we must not ignore the benefits of the automated 
        essay scoring.
        Points  92.86%
        Semantic gibberish
        
        Sentence  There is no chance that the system is biased towards any 
        particular student.
        Points  92.59%
        Semantic gibberish
        
        irrelevant sentences were as follows:
        
        Sentence  I strongly disagree with the argument that automated essay 
        scoring is unfair to students.
        Points  70.55%
        
        Sentence  In fact, they are proven to be comparable, if not better, to
         a human grader.
        Points  74.49%
        
        Sentence  Second, the issue of harshness of the automated system is 
        really irrelevant.
        Points  74.37%
        
        Sentence  Therefore there is no unfair harshness here.
        Points  84.85%
        
        Sentence  In any case, automated grading is proven to be as harsh as a
         human grader.
        Points  72.75%
        
        Sentence  It is cost effective� it is half as expensive as a human 
        grader.
        Points  78.86%
        
        Sentence  In conclusion, I would like to say that automated essay 
        scoring are a very fair way of scoring, if implemented correctly.
        Points  73.77%
        
        fact sentences were as follows:
        
        Sentence  The automated essay scoring systems in use are carefully 
        designed by natural language processing experts.
        Points  40.31%
        
        I can say this is a fact with 40.31 percent confidence
                
        
        ordinary sentences were as follows:
        
        Sentence  Automated essay scoring systems might grasp the nuances of 
        every witty writing, but it certainly does well, the task for which it
        is assigned� namely grading of analytical writing essays.
        Message  This sentence is in future tense
                
        Sentence  Argument claims that a student can express an idea in ways 
        not known to the automated system thus resulting in a poor score.
        Message  This sentence is in future tense
                
        Sentence  To refute this claim I must point out the fact that there 
        are many NLP techniques which look at the general characterisitcs of a
         good essay, rather than a particular way, to decide a score.
        Message  This sentence is in future tense
                
        Sentence  Even the most ingeniously different essay follows the 
        guidelines of a good essay, otherwise it will not be able to represent
         the idea coherently.
        Message  This sentence has comparitive/superlative adverbs/adjectives
                
        Sentence  However harsh a system might be, as long as it is the common
         denominator for all the essays, the relative scores are still the 
        same.
        Message  This sentence is in future tense
                
        Sentence  Like all machine-based approach, it does not suffer from 
        errors due to fatigue and mental state.
        Message  This sentence contains the value word fatigue
        
        Sentence  The The speed of such a system will only increse as more 
        processing power is added and new techniques are developed.
        Message  This sentence has comparitive/superlative adverbs/adjectives
        
        Sentence  However, there is, and always should be, atleast one human 
        grader in the scoring process to take care of the anamolies that might
         arise in certain rare cases.
        Message  This sentence is in future tense




RELATION TO PREVIOUS WORK

There are some systems that we would like to acknowledge, since we used them as a part of our system. For Gibberish Detection, we used Link Grammar, a grammar parser[4]. This parser enables us to find unused links and unknown words in a sentence. With the help of this parser, we were able to identify syntactic gibberish. We have used WordNet::Similarity package, built on WordNet2.0[10] for finding the semantic similarity between two words. This is used in identifying semantic gibberish as well as relevance of a sentence to the topic. For identifying statements of fact, we used ideas from online resources [2], [6], [7] and [8]. Each of these references have helped us decide the basis on which to distinguish facts from opinions and also to identify the properties of a statement of fact. To check for the accuracy of these statements of facts, we use the idea of a fact repository from Static Knowledge Sources[9], which would be wikipedia.org[3] in our case. We access this online encyclopedia by constructing a Google query using Google API[11] and fire the query to retrieve wikitext for the first match found by Google in en.wikipedia.org.

References

[1] Page, E.B. , 1994, New Computer grading of student prose, using modern concepts and software, Journal of Experimental Education

[2] Identification of a Fact, http://et.sdsu.edu/saeria/671/facts/fact-identification.html

[3] Wikipedia, http://en.wikipedia.org/wiki/Main_Page

[4] Link Grammar, http://www.link.cs.cmu.edu/link/

[5] Peter W. Foltz, Darrell Laham, Thomas K. Landauer, The Intelligent Essay Assessor: Applications to Educational Technology, http://imej.wfu.edu/articles/1999/2/04/index.asp

[6] Langan, John , ``Ten Steps to Improving Reading Skills'', http://www.waycross.edu/ismt/fact.htm

[7] Human Value Project, http://www.uia.org/values/vztab12.htm, Union of International Associations 1997 - 2004

[8] Fact Indentification Strategies, http://et.sdsu.edu/saeria/671/facts/fact-identification.html

[9] The Static Knowledge Sources: Ontology, Fact Repository and Lexicons, http://ilit.umbc.edu/Book/sks.htm

[10] WordNet 2.0, ``A lexical database for the English language'', http://www.cogsci.princeton.edu/~wn/wn2.0.shtml

[11] Google API, http://www.google.com/apis/

[12] Valentini, Salvatore, Francesca Neri, and Alessandro Cucchiarelli. 2003. An Overview of Current Research on Automated Essay Grading. In JITE, Vol-2,2003

[13] Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., and Harshman R.A. , 1990, Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science

[14] Hearst, M., 2000, The debate on automated essay grading. IEEE Intelligent Systems

[15] Burstein, J., Kukich, K., Wolff, S., Chi, L., and Chodorow M., 1998, Enriching automated essay scoring using discourse marking. Proceedings of the Workshop on Discourse Relations and Discourse Marking, Annual Meeting of the Association of Computational Linguistics, Montreal, Canada.

[16] Burstein, J., Leacock, C., and Swartz, R. , 2001, Automated evaluation of essay and short answers. Proceedings of the Sixth International Computer Assisted Assessment Conference, Loughborough University, Loughborough, UK.

[17] Christie, J.R., 1999, Automated essay marking-for both style and content. Proceedings of the Third Annual Computer Assisted Assessment Conference, Loughborough University, Loughborough, UK

[18] Ming, P.Y., Mikhailov, A.A., and Kuan, T.L., 2000, Intelligent essay marking system. Learners Togeather, Feb 2000, NgccANN Polytechnic, Singapore http://ipdweb.np.edu.sg/lt/feb00/intelligent_essay_marking.pdf IEMS

[19] Mitchell, T., Russel, T., Broomhead, P., and Aldridge N.(2002) . Towards robust computerized marking of free-text responses. Proceedings of the Sixth International Computer Assisted Assessment Conference, Loughborough University, Loughborough, UK.

[20] Landauer, T.K., Foltz, P.W., and Laham D. , 1998, An introduction to latent semantic analysis. Discourse Processes, http://lsa.colorado.edu/pepers/dp1.LSAintro.pdf

[21] List of Value words, http://www.d.umn.edu/~dodd0036/stoplist.txt

[22] List of Fact words, http://www.d.umn.edu/~dodd0036/factlist.txt