In a system with virtual memory the main memory can be viewed as a cache for the disk, which serves as the lower-level store. Due to the enormous difference between memory access times and disk access times, a fully-associative caching scheme is used. That is, the entire main memory is a single set - any page can be placed anywhere in main memory. This makes the set field of the address vanish. All that remains is a tag and an offset.

Since the tag field just identifies a page it is usually called the page number field.

20 12
page number offset

With a virtual memory system, the main memory can be viewed as a local store for a cache level whose lower level is a disk. Since it is fully associative there is no need for a set field. The address just decomposes into an offset field and a page number field. The number of bits in the offset field is determined by the page size. The remaining bits are the page number.

An Example

A computer uses 32-bit byte addressing. The computer uses paged virtual memory with 4KB pages. Calculate the number of bits in the page number and offset fields of a logical address.

Answer

Since there are 4K bytes in a cache block, the offset field must contain 12 bits (212 = 4K). The remaining 20 bits are page number bits.

Thus a logical address is decomposed as shown below.

Virtual memory address translation uses a page table. In the diagram to the left, a page table is represented by the large box. It is a structured array in memory. It is indexed by page number.

Each page table entry contains information about a single page. Part of this information is a frame number (green) — where the page is located in physical memory. In addition there are control bits (blue) for controlling the translation process. Address translation concatenates the frame number with the offset part of a logical address to form a physical address.

A page table base register (PTBR) holds the base address for the page table of the current process. It is a processor register that is managed by the operating system.

Each process running on a processor needs its own logical address space. This can only be realized if each process has its own page table. To support this, a processor that supports virtual memory must have a page table base register that is accessible by the operating system. For operating system security, this register is only accessible when the processor is in system mode.

The operating system maintains information about each process in a process control block. The page table base address for the process is stored there. The operating system loads this address into the PTBR whenever a process is dispatched.

A page table entry contains information about an individual page in a process's logical address space. It typically has a size of 4 bytes (32 bits). It contains two kinds of information:

With 32 bit physical addresses and 4KB pages the frame number only requires 20 bits. Then a 4B page table entry has 12 bits that can be used for controlling virtual memory.

For good performance, the processor must normally handle the translation process without having to bring up the operating system. However, the operating system

The bits that control access legality are defined by the processor but written by the operating system. The processor does not modify these bits. It just reads them to decide if it should attempt an access.

Access is allowed only if the valid bit is 1 and the appropriate access permission bit is 1.

For some processors there is no execute access bit. Then the read access bit controls instruction fetch. Having all three access permission bits gives the operating system better control over security.

The operating system sets the valid bit to 0 for two purposes:

Two replacement information bits are defined by the processor. They are both set to 1 by the processor and reset to to 0 by the operating system:

There are also bits that are not defined by the processor. These bits are generally used to estimate how long it has been since the page was last referenced. The operating system periodically checks the referenced bit, using it to update its usage control bits, then resets the referenced bit to 0. It can use its usage control bits to make replacement decisions.

For example, the usage control bits can be a simple counter. When the operating system makes a periodic check it

The counter just counts how long (number of periodic checks) since the page has been accessed. When the operating system need to replace a page to make room for another one, its best choices are pages with large counter values.

If an access is not allowed based on the access legality bits then the processor just traps to the operating system. The processor must be designed to put a code indication that the failure was a memory access failure into a special processor register that is only accessible in system mode. The operating system then invokes its handler for memory access failures.

The handler determines whether the error was due to an illegal address in the currently executing program or a page that is not currently in memory. In the latter case the operating system had earlier reset the valid bit to 0 to ensure that the processor would trap to the operating system. When it resets the bit, it can use all of the page table entry bits except the access legality bits to record the disk location for the page.

Using virtual memory involve some design issues:

All of these issues are solved in modern processors.

 
=
 
logical address space size
 
×
 
page table size
page table entry size
page size
 
=
 
4GB
 
×
 
 
=
 
page table size
4B 4MB
4KB

The size of a page table is given by the following equation.

For example, many processors have 32-bit logical addresses, which results in a 4GB logical address space size. The page table entries size is usually 4B. If the page size is 4KB then the page table size is

This is excessive, especially on a processor that is running hundreds or even thousands of processes. About 20 years ago, many introductory programming classes had all students running their programs on a mainframe computer. Imagine perhaps 400 students running "Hello, World!" programs, each using 4MB just for page tables.

Most processors use multilevel page tables to reduce the size of page tables for small programs.

Multilevel page tables are split into two or more levels. For example, a 2-level table is organized as follows.

For chunks that have not been allocated to the process, the false valid bit triggers a hardware exception. Since the chunk is not allocated by the operating system there is no need for a low-level table. If a process only needs a small amount of memory it will have one high-level table and only a small number of low-level tables. Each of these tables is typically only a few KB.

A logical memory access (instruction fetch, load, or store) involves two physical memory accesses:

If nothing is done about it, the use of virtual memory at least doubles the latency for logical memory accesses. It triples the latency if the processor uses two-level page tables.

All but the very earliest processors that supported virtual memory have used a translation lookaside buffer (TLB) to eliminate most of the page table entry retrievals. The TLB is just a cache that holds recently accessed page table entries. It can achieve very small miss rates with just a few thousand entries.

When virtual memory is used as part of a primary storage hierarchy along with ordinary caches, an important issue arises: where to do the virtual to physical address translation. For early RISC processors that had caches incorporated into the chip, the easiest thing to do was to add on the address translation after the cache.

This meant that virtual addresses were used in the cache. Since virtual addresses are unique to a process, the cache had to be invalidated when the operating system did a process switch. Consequently, every time a process resumed execution there was a flurry of cache misses. The additional time dealing with these misses could easily exceed the amount of time that the operating system was spending in restoring the process state.

Modern processors have moved the address translation so that it is done before sending addresses to the cache. This means that physical addresses are used in the cache. This results in extra pipeline stages for instruction fetch and data memory access. With large enough caches, the cached instructions and data for a process will mostly be retained in the cache while other processes are using the processor. This has significantly reduced the time for a process switch.