Computer Science 1511
Computer Science I

Programming Assignment 6
Text Files (35 points)
Due Tuesday, November 23, 1999

Introduction

In this assignment you will read and analyze a text file. The problem is to read a text file character by character and print out a ``token map'' of the file, where tokens are meaningful objects from the file like words, numbers or punctuation marks. In the token map you will print out a message Xlength for each word or number, where length is the number of characters in the token and X is:

After each number token you should print out, between parentheses, the value of that number. Also, at the end of the program you should print out the sum of all the number tokens in the file. For example, suppose the text file contains the following:

Constitution of the United States of America
(In Convention, September 17, 1787)

Preamble
   We the people of the United States, in order to form a more
perfect union, establish justice, insure domestic tranquility,
provide for the common defense, promote the general welfare, and
secure the blessing of liberty to ourselves and our posterity, do
ordain and establish the Constitution of the United States of
America.

Your program should produce the following response:
   1: L12 s2 s3 L6 L6 s2 L7
   2: p s2 L10 p L9 n2(17) p n4(1787) p
   3:
   4: L8
   5: s2 s3 l6 s2 s3 L6 L6 p s2 l5 s2 l4 s1 l4
   6: l7 l5 p l9 l7 p l6 l8 l11 p
   7: l7 s3 s3 l6 l7 p l7 s3 l7 l7 p s3
   8: l6 s3 l8 s2 l7 s2 l9 s3 s3 l9 p s2
   9: l6 s3 l9 s3 L12 s2 s3 L6 L6 s2
  10: L7 p
  11:
Sum of numbers in file: 1804

Note that:

How To Proceed

I suggest that you proceed in stages:

While you are free to design your program any way you wish, you must follow good top-down design principles. For example, you might write your program such that each time the start of a word or number was read a function or functions would be called that would read to the end of the word or number.

What To Hand In

Hand in a lab report with a copy of each of your test data files and the output for each. Include a second copy of each test data file. On this copy underline each of the words and the numbers using different colors of ink (for example, you might underline short words in red, long words starting with capital letters in green, long words starting with lower case letters in blue, numbers in black and punctuation in purple). Also write a value indicating the length of the word/number over the word/number.

EXTRA CREDIT

2 extra points - make it so that one single quote character may appear in a word (though not as the first character). For example, don't would count as one word of length 5 rather than as one word of length 3, a punctuation mark, and then another word of length 1.

2 extra points - allow multiple dashes (and ONLY dashes) as in -- (2 consecutive dashes) or --- (three consecutive dashes) to be treated as a single punctuation mark. Make it so that if the punctuation mark is not a single character, your program will print out not only p, but the number of characters in the punctuation, but only if the punctuation has more than 1 character.