Computer Science 1621
Computer Science I

Programming Assignment 5
Text Files (40 points)
Due Friday, November 13, 1998 (** NO LATE PROGRAMS **)

Introduction

In this assignment you will read and analyze a text file. The problem is to read a text file character by character and print out a ``token map'' of the file, where tokens are meaningful objects from the file like words, numbers or punctuation marks. In the token map you will print out a message Xlength for each word or number, where length is the number of characters in the token and X is 'C' if the token is a word that starts with a capital letter, 'L' if it is a word that starts with a lowercase letter, and 'N' if the token is a number. If a punctuation character is encountered you will print out a 'P'. For example, suppose the text file contains the following:

Constitution of the United States of America
(In Convention, September 17, 1787)

Preamble
   We the people of the United States, in order to form a more
perfect union, establish justice, insure domestic tranquility,
provide for the common defense, promote the general welfare, and
secure the blessing of liberty to ourselves and our posterity, do
ordain and establish the Constitution of the United States of
America.

Your program should produce the following response:
   1: C12 L2 L3 C6 C6 L2 C7
   2: P C2 C10 P C9 N2 P N4 P
   3:
   4: C8
   5: C2 L3 L6 L2 L3 C6 C6 P L2 L5 L2 L4 L1 L4
   6: L7 L5 P L9 L7 P L6 L8 L11 P
   7: L7 L3 L3 L6 L7 P L7 L3 L7 L7 P L3
   8: L6 L3 L8 L2 L7 L2 L9 L3 L3 L9 P L2
   9: L6 L3 L9 L3 C12 L2 L3 C6 C6 L2
  10: C7 P
  11:

Note that:

How To Proceed

I suggest that you proceed in stages:

While you are free to design your program any way you wish, you must follow good top-down design principles. For example, you might write your program such that each time the start of a word or number was read a function or functions would be called that would read to the end of the word or number.

What To Hand In

Hand in a lab report with a copy of each of your test data files and the output for each. Include a second copy of each test data file. On this copy mark each of the words and the numbers (use one color of ink to mark words starting with capital letters, a different color to mark other words, and a third color for numbers) with a value indicating the length of the word.

EXTRA CREDIT

2 extra points - make it so that one single quote character may appear in a word (though not as the first character). For example, don't would count as one word of length 5 rather than as one word of length 3, a punctuation mark, and then another word of length 1.

2 extra points - allow multiple dashes (and ONLY dashes) as in -- (2 consecutive dashes) or --- (three consecutive dashes) to be treated as a single punctuation mark. Make it so that if the punctuation mark is not a single character, your program will print out not only P, but the number of characters in the punctuation, but only if the punctuation has more than 1 character.