Program Files, Linking, and Loading

Programming involves creating files called source code files. Most programming involves two additional important types of files:

The formats for these file types and the memory image are defined by an operating system. An operating system also provides support software called loaders and linkers for handling these file types. In modern operating systems this software is executed in part dynamically; that is, while the code is executing.

There are some executable files, not considered here, whose formats are not defined by the operating system. These files are handled by interpreters for languages such as Java, Perl, and Ruby.

File Types

The format of object and executable files depends on the operating system. Compilers and assemblers have to be adapted for different operating systems in order to generate output that conforms to the appropriate format.

On Microsoft Windows® platforms object file names have a .obj suffix. Compiled and assembled executable file names have a .exe suffix.

On Unix based platforms object file names have a .o suffix. There is no conventional file name suffix for compiled and assembled executable files. A Unix operating system can, however, recognize these file types by special 4-byte codes at the start of the file.

Loading

A loader takes an executable file and copies its sections into memory. Then it produces a process control block to control program execution. Finally, it starts executing the code, usually by jumping to its main address.

Requirements

A loader must be able to

  • set up text and initialized data in memory
  • initialize register copies in the process control block
  • initialize the PC copy in the process control block

Relevant information must be included in the executable file format.

Other Considerations

Some other needs have to be considered in the design of executable file formats:

  • For security separate data and instructions are desirable:

    executable file uses separate data and text sections

  • Programmers want symbolic debugging capability:

    executable file includes sections for the symbol table and associating executed code with source code

Separate Compilation

The ability to break up source code for a program into smaller code units — separate compilation — has two important advantages:

  • It reduces compilation time for incremental changes.
  • It simplifies source code navigation for maintenance changes.

But separate compilation introduces two problems:

  • Relocation

    When a compiler allocates memory locations for a source code file it starts with addresses just above the addresses set aside for the operating system. Only one of the source code files can use these addresses. The others have to be relocated and their address references have to be changed accordingly.

  • External references

    To be useful, separately compiled files must have references to each other. For example, one file will call subprograms in another file. The addresses of these external references are not known when the files are compiled separately.

Linking, Linkers, and Object Files

Supporting separate compilation requires operating system software to combine the code from multiple compilation steps. This software is called a link editor or, more simply, a linker.

  • It produces an executable file from several object files.
  • It relocates separately compiled code segments.
  • It resolves external references.

The object files are the result of compiling single source code files.

  • They contain data and text sections like executable files.
  • The start address is omitted from all object files except the one containing the main program.
  • They contain symbol tables for resolution of external references.
  • They contain relocation tables for code addresses that need to be relocated.

Dynamic Linking and Loading

As is often the case in computer science, the static/dynamic distinction has the following meaning:

  • Static describes something done before program execution. For example, assemblers statically allocate memory for variables declared with .word, .float, .double, and .asciiz directives.
  • Dynamic describes something done during program execution. For example, memory allocated by a sbrk syscall is allocated dynamically.

In keeping with this common terminology, the linking and loading described earlier is called static linking and loading. Dynamic linking and loading refers to linking and loading done during program execution. Modern operating systems typically use dynamic linking and loading for programming language library functions.

Benefits

Dynamic linking and loading has three important benefits:

  • Software always uses latest versions of shared libraries.
  • Executable files are smaller. They do not include the shared libraries.
  • The total memory footprint for multiple processes is reduced. With virtual memory, different programs using the same library function only need a single copy in physical memory. If designed carefully the shared library subprograms can have different logical addresses in different programs.

Implementation Concept: Jump Tables

A jump table implementation of dynamic linking and loading is lazy - it defers loading and linking of each subprogram until it is needed. However, the loading and linking is only done once per subprogram. After it is loaded and linked, a subprogram can be called again as many times as needed with negligible overhead.

  • A jump table contains an entry for each dynamically loaded subprogram.
  • Each entry contains executable code.
  • Each call to a dynamically linked subprogram is coded as a jump and link whose target address is the appropriate entry in the jump table.
  • Before its subprogram is loaded, an entry just contains a system call and its setup instructions for loading the required subprogram.
  • After a dynamically loaded subprogram is loaded, its jump table entry is replaced by a jump to the entry address for the subprogram.

Multiple dynamically linked subprograms are typically gathered into library files called dynamic link libraries. These files typically use a .dll suffix in the Windows operating system and a .so suffix on Unix-based operating systems.