Virtual Memory

Virtual memory dates back to 1962, when it was used in the Atlas computer. Initially it provided assembly language programmers and compilers with a large virtual or logical address space whose addresses were translated into either addresses for a small physical memory or disk locations. In effect, the physical memory served as a cache local store with a disk used as a lower-level store.

In modern systems, physical memory is large enough that virtual memory is not often needed to have get a large address space. However, it provides several additional benefits simplifying operating system management of multiple processes.

The Major Consideration

The most important consideration in the design of virtual memory is the huge difference between main memory access times (~40 ns) and disk access times (~4 ms). Two consequences of this difference are:

A significant part of disk access is waiting for the desired data to rotate underneath the read/write heads, which averages one half of the disk rotation time. Once that has happened, a large amount of data can be read in a relatively short time. Thus the difference between access times for large block and small blocks is small.

Terminology

Although virtual memory is now viewed as a cache mechanism, for a variety of reasons virtual memory terminology is different from cache terminology.

Benefits

The initial purpose of virtual memory was creating a transparent view of a large address space supported by a small physical memory. Today that is not as important as advantages in process and memory management by the operating system. For example, today there are desktop systems with 32-bit addressing and 4GB of main memory. Unless addresses are made larger than 32 bits, virtual memory cannot increase the logical address space beyond 4GB.

But virtual memory is still thriving.

A Hierarchical Design Issue

When virtual memory is used as part of a primary storage hierarchy along with ordinary caches, an important issue arises: where to do the virtual to physical address translation. For early RISC processors that had caches incorporated into the chip, the easiest thing to do was to add on the address translation after the cache.

This meant that virtual addresses were used in the cache. Since virtual addresses are unique to a process, the cache had to be invalidated when the operating system did a process switch. Consequently, every time a process resumed execution there was a flurry of cache misses. The additional time dealing with these misses could easily exceed the amount of time that the operating system was spending in restoring the process state.

Modern processors have moved the address translation so that it is done before any cache. This has resulted in extra pipeline stages for instruction fetch and date memory access. With large enough caches, the cached instructions and data for a process will mostly survive the time that it is waiting while other processes are using the processor.