[an error occurred while processing this directive]

Separate Compilation and the UNIX make Program


Design of large programs

The design of a large program requires a great deal more care than a small program. In order to make the complexity of a large program manageable, the program is broken down into components, each of which is responsible for a small, well-defined part of the overall function of the program. Most languages have procedure and/or function constructions for dividing a program into components.

But for larger programs, higher-level components are needed. Consider a calculator program that calculates the values of C-like real expressions involving variables and can assign values to variables. A program designer can come up with a reasonable decomposition of this program into high-level components without having to consider details about how those components are implemented.

First of all, since the expressions involve variables, there is a need to be able to save and retrieve values of variables. The need for this kind of functionality arises often in programs, and there are well-known data structures called tables that provide that kind of functionality.

Second, file input is in terms of characters, but understanding expressions and how they are evaluated is easiest with units called tokens. A token is a unit of text, such as a number or identifier or an operator that may contain more than one character. For a calculator program, it is useful to have a high-level component that breaks up a stream of characters into these tokens and classifies them.

With these two components, the work of the main program is much easier to accomplish, with the responsibility of token classification and variable value storage and retrieval delegated to other components.

Both components involve several functions that work together in a coordinated way. For example, the table component will require a function for assigning a value to a variable and another for retrieving the value of a variable, and it will also need some sort of body of data (perhaps an array) for saving the data. Since these components have distinct responsibilities, it is desirable to be able to put them into separate code files table.C and tokens.C, and compile them and test them separately. The main program is then written in a third file calculator.C, and it is tested and compiled after the others have been thoroughly tested.

Separate compilation

Separate compilation is an integral part of the standard for the C programming language. When a C source code file is compiled there are two tasks performed by the compiler. First, the file is compiled into a format called an object file. Then the object file is linked with a library of standard C functions to produce an executable file.

An object file contains coding of the source code file into language that the machine understands. It is incomplete in that it may contain calls to functions that are in other files, and it need not contain a main function. When Unix compilers produce an object file, it has the same name as the source code file except that the suffix is changed to .o.

Suppose you want to compile prog.C to produce the executable program prog. Normally, you would give the command

g++ -Wall -o prog prog.C
First the g++ program compiles prog.C and creates the object file prog.o. Then it links prog.o with the standard C libraries and creates the executable file prog. When called with a single .C argument, the g++ program removes the .o file.

The compilation and linking steps can be executed separately as follows:

g++ -Wall -c prog.C
g++ -o prog prog.o
The first g++ command just compiles prog.C into prog.o, and the second links it to create prog. The g++ program recognizes the .o suffix so that it doesn't try to compile prog.o.

Now suppose you have C code files described above for the calculator program. Then the following commands will create the corresponding object files:

g++ -Wall -c calculator.C
g++ -Wall -c table.C
g++ -Wall -c tokens.C
When the g++ program is called with a list of .o files it will link them all together to form an executable program. So the following command creates the calculator program:
g++ -o calculator calculator.o table.o tokens.o

Header files and #includes

If the main program calls a function that is defined in another file, then the compiler needs to see a declaration for the function before the call in order to check correctness of argument and returned value types. In addition, the main program may need to declare variables whose types are declared elsewhere. This is done with header files, where shared types and functions are declared. Conventionally, the name of a header file ends in a .h suffix.

In the example calculator program, the files table.C, and tokens.C should have header files table.h and tokens.h. These header files define types and functions that are shared with other files. A pair of files consisting of a code file and its associated header file are referred to as a module. When a shared type or function is declared in a header file, the declaration should be the same as a declaration in a C file that is not separately compiled.

Including header files
A header file should be included in the file containing the code, as well as any file which uses its types and functions. A header file gets included in other files in which one of the following lines appear.
#include <header-file>
or
#include "header-file"
The first form is used for standard library header files and other header files that are provided with the computer system. The second form is used for all other header files. In either case, after the compiler reads the include line it will start reading from the header file. When it finishes reading the header file, it resumes reading where it left off in the original file. Header file includes should appear near the beginning of any file which uses the information in the header file, before any code for or calls to functions declared in the header file.

The file calculator.C should include both of the header files since it will use types and functions from both of the other modules. The file tokens.C does not need any types or functions from the tables module, so it should only include tokens.h. Finally, the file tables.C does not need any types or functions from the tokens module, so it should only include tables.h.

Ordering include files
For complex programs, some care is needed to get a correct order of includes. The important thing to remember is that the compiler reads the header files in the same order that they are included. If one header file uses a type that is declared in another header file then the header file with the type declaration should be included first. For C programs, this often arises with regard to header files that declare standard library functions, types, and constants. These include files never use programmer defined types so they can always safely appear before includes of other header files. The order of system header file includes is usually not important. Exceptions are documented in the on-line man pages.

The Unix make program

Several difficulties arise when you do separate compilation. First of all, you must give many different calls to g++ to compile all of your pieces. This means you have a lot of opportunities for mistyping commands. Secondly, you must remember which files are up-to-date, and which need to be recompiled. Forgetting to recompile something which has been corrected means the mistake doesn't go away, even though you think you've fixed it. This can lead to terrible confusion. The solution to these difficulties is to use the Unix tool make to manage your compilations for you.

When you type the Unix command make, the make program looks in the current working directory for a file named either makefile or Makefile. The make program then uses the makefile to determine how to update your program files to produce a completely compiled package.

Makefile entries

A makefile is a sequence of entries, each having the following form:
targets : prerequisites
	commands
Targets and prerequisites are both lists of filenames separated by blanks. The targets and prerequisites specify dependencies between files, with the target files being dependent on the prerequisites. This tells the make program that every target file should be remade whenever a prerequisite file is modified. An entry always has at least one target, but it can have no prerequisites.

The commands part of each entry consists of zero or more lines, each starting with a tab character, followed by a command to be executed. These commands should tell the make program how to remake target files. The tab is crucial; if it's not there then the make program does not know that it is reading a command line.

When the make program starts up, it determines the age of each file in the current working directory by looking at time stamp information, which is a part of all Unix files. If a target file for an entry does not exist, or if it is older than any of the prerequisite files, then the make program issues all of the commands.

In simple makefiles there will be two kinds of entries. One kind creates an object file by compiling a C source code file. The second kind links object files into an executable program.

Makefile entries to create object files

Here is a typical makefile entry to create an object file:
table.o : table.C table.h
	g++ -Wall -c table.C
This entry says that the object file table.o is dependent on the files table.C and table.h. When the make program starts up, it determines the age of the files table.o, table.C, and table.h. If table.o does not exist, or if it is older than either of the files table.C or table.h, then the make program issues the command g++ -Wall -c table.C to make or remake table.o.

When writing a program in pieces, each .o file should be made dependent on all of the header files that are included in the corresponding .C file. The reason for this is that if the header file is changed, the .o file needs to be remade. If the .o file is not remade, it is likely that different versions of the header file will be used in different parts of the program. The symptoms of this problem are unpredictable, so take care in setting up dependencies.

The make program should not be responsible for making .C and .h files. These are made by the programmer using an editor. Thus .C files and .h files should not be target files. The make program also should not be responsible for making .o files if their .C file is not available. For example, if you are working on a team and you are using a .o file provided by someone else, then your makefile should not contain an entry for that .o file.

Makefile entries to create executable programs

Here is a typical makefile entry to create an executable program:
calculator : calculator.o tokens.o table.o
	g++ -o calculator calculator.o tokens.o table.o
This entry tells the make program that calculator should be remade whenever it is older than any of the object files calculator.o, tokens.o or table.o. The compiler, when given a list of object files, will just link them to form the executable program. If the makefile also contains entries in which calculator.o, tokens.o, or table.o appear as targets then the make program will try to make them first.

A complete makefile example

Suppose you are working on the calculator program described earlier and that you are responsible for the main program module (calculator.C) and the table module (table.C and table.h), but someone else has provided you with the object file tokens.o and the header file tokens.h. Then the following makefile could be used.
calculator : calculator.o tokens.o table.o
	g++ -o calculator calculator.o tokens.o table.o

calculator.o : calculator.C tokens.h table.h
	g++ -Wall -c calculator.C

table.o : table.C table.h
	g++ -Wall -c table.C
Note that there is no entry with tokens.o as a target since it is provided by another person; there is no way that the make program can recreate it. It is best to leave tokens.o as a prerequisite for calculator in case you get an updated version.

Some advanced make program features

If a dependency or a command is too long to fit on one line it can be continued onto the next line by using a backslash. At the point where you want to break the line, insert the backslash followed immediately by a carriage return, then the rest of the line. This will not work if there are any characters, including spaces, between the backslash and the carriage return.

Usually the command make is given by itself in response to the Unix prompt. When the make program is run this way, it tries to make the file or files appearing before the colon in the first entry of the makefile. It will check the files after the colon to see if they are up to date (as determined by other entries in the makefile) and, if necessary, update them. Thus with the given makefile, the make program will first check the object files to see if they are up-to-date. If not, or if they don't exist, they will be created using the commands given in the later entries. Once these files are made, the executable program calculator can be made using the command in the first entry of the makefile.

The make program can also be run with the command

make target-name
where target-name is a target of some entry in the makefile. When used this way, the make program only updates files necessary to make target-name. Thus the command
make table.o
will make table.o without going on to make the rest of the package.

[an error occurred while processing this directive]