Computer Science 1511
Computer Science I

Laboratory Assignment 9
Opening, Closing, Reading From and Writing to Text Files
Due at the end of Lab

Introduction

A text file is a sequence of bits stored on a secondary storage device such as a floppy disk or hard disk that is interpreted in a particular way. In a text file, the sequence of bits is divided up into byte-sized pieces (8 bits to a byte) where each byte represents a code for an ASCII character (see Appendix A of your textbook for a table of the ASCII values).

For many applications, it is inconvenient for a user to type all of the information a program needs to run every time that program runs. For example, an airline reservation program needs to be told significant information to work (what flights are available, which seats are taken so far, etc.), but if a user had to type all of this information before the program could be used the program would be impractical. Files provide a mechanism for programs to use information multiple times. We store the information needed by the program in a file that is then re-read every time the program runs. Thus the user needs to type this information only once.

Another useful aspect of files is that we can have a program not only to read but also to write new files and to update existing files. For example, we might have an airline reservation program add reservations to a file as they are made so that future users will not try to reserve seats already taken. This information would then be available to future programs by reading this file.

Manipulating files

To make it possible to manipulate files, C provides mechanisms for creating connections to files (using fopen) and for closing those connections (using fclose). Details on how to use these routines are given in the textbook and class notes.

Once a connection is made to a file, we can read information from that file (if the connection is a read connection) or write information to that file (if the connection is a write or append connection). To read information from a file we can use the fscanf routine which is a form of the scanf command that applies to files (scanf applies to information typed to the keyboard). To write information to a file we can use the fprintf routine which is a form of the printf command that applies to files (printf applies to information printed to the terminal). We also have other routines that let us read (getc, fgetc) and print (putc, fputc) single characters from a file. These routines allow us to do more low-level reading of a file (though fscanf and fprintf can also be used to read a single character).

In this lab we are going to get some experience with opening and closing files and with reading information from and writing information to files.

The Program

The following program attempts to open a file named getadr.htm and to show that file to the user by printing it to the text window. In this lab we are going to make a few changes to the program to make it able to read and show us the text from HTML (web) documents. We are then going to write the resulting text to a new file.


#include <stdio.h>
#include <stdlib.h>

int main() {
  FILE *connectfile;
  int ch;

  if ((connectfile = fopen("C:\\getadr.htm","r")) == NULL) {
    printf("Unable to open file \"getadr.htm\"\n");
    exit(-1);
  }

  while ((ch = fgetc(connectfile)) != EOF) {
    printf("%c",ch);
  }

  fclose(connectfile);

  fflush(stdin);
  printf("Press return to finish.\n");
  getchar();
  return 0;
}

What to do

First, copy the code to a directory on your disk. Next, make sure that no file named getadr.htm is around on that disk. Now compile and run the code. The program should terminate immediately indicating that there is no file named getadr.htm.

Access the file at this link through your web browser and use the FILE/SAVE AS command in your web browser to save the file as C:\getadr.htm. Now rerun your program seeing what the program produces.

Add statements to the code printing out a line number before each line in the form:

nnnn:
as in
   1:

for line 1. The easiest way to do this is to print the line number for line 1 before the loop for reading the characters starts, and then adding code to print a new line number after each newline character is read.

Next, create variables uppercase and lowercase to count the total number of upper and lower case characters that are present in the input file. To do this, check for the range of the ASCII value of the character. If the character lies in the range of 65 to 90 then it is a upper case character and the count of upper case characters is incremented by 1, if the character lies in the range 97 to 122 then it is a lower case character and the lower case count is incremented by 1. Print the total number of upper and lower case characters. Use the following statement to check for the upper case characters:

		 if ((ch >= 'A') && (ch <= 'Z')) uppercase++;

A similar statement can be added to check for lowercase characters.

Next, set your code so that any piece of text between a pair of angle brackets (<>) does not appear on the screen. To do this, check the character just read before writing it, if the character is a left angle bracket (<) do not print it out. Instead, set a flag (an integer variable) to indicate that you are currently reading an HTML command (something between angle brackets). Then, as long as this flag is set, you would not print out any of the characters read. When a right angle bracket (>) is read you would set the flag back to its initial state (not reading an HTML command).

Finally, set up the program so that every character that appears between the angular brackets, including the angular brackets is written to the screen and the rest, that is the text which is not inside the angular brackets is written to a file named getadr.txt. To do this you will need to add a new FILE * variable that is set up using an fopen on getadr.txt. Use the method of 'flag' mentioned above to print the text between the angular brackets on the screen. Use a fprintf command to write the character not included in the angular brackets to the file getadr.txt. Do not forget to add a correspond fclose command at the end of your program.

What to turn in

Turn in a hard copy of your final program. Also, turn a copy of your input file and the file produced by your program, along with the output which appears on the screen.