Caching is a very general technique for improving computer system performance. Based on the principle of locality of reference, it is used in a computer's primary storage hierarchy, its operating system, networks, and databases. Caching improves access time and reduces data traffic to data sources that have limited throughput.

For most of this web page only reads are considered. A section at the end deals with issues that arise when writes are considered.

In addition, caching can be used in dynamic programming algorithms for computed data, and in just-in-time compilation for machine code translations of functions or methods.

Modern computer primary storage hierarchies use multiple levels of caching in order to bridge the gulf between processor speeds and main memory and disc speeds. Servers generally use more levels than desktop systems, but some recent desktop systems have as many levels as server systems.

Older Desktop Hierarchy
Level Controller Storage Technology
L1 Cache on-chip on-chip SRAM
L2 Cache motherboard motherboard SRAM
Main Memory motherboard motherboard DRAM
Virtual Memory operating system disk magnetic
Server Hierarchy
Level Controller Storage Technology
L1 Cache on-chip on-chip SRAM
L2 Cache on-chip on-chip SRAM
L3 Cache on-chip motherboard SRAM
Main Memory motherboard motherboard DRAM
Virtual Memory operating system disk magnetic

The principles of locality are facts about data and instruction accesses that arise from patterns that we frequently use for storing and accessing information. It has two aspects: temporal locality and spatial locality.

Recently accessed data and instructions are likely to be accessed in the near future.

We expect temporal locality with regard to instructions because a large part of execution time is spent in repetitive code such as loops and recursions. We expect temporal locality with regard to data for a variety of reasons. One example is accessing control and summary variables inside loops such as i and sum in the following loop.

      sum = 0;
      for (int i = 0; i < 10; i ++) {
          sum = sum + A[i];
      }
      

Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future.

The phrase "close to" has different meanings in different contexts. In the case of primary memory caches, "close to" refers to the proximity of memory addresses. For disk caching, "close to" refers to proximity of data location on a disk.

We expect spatial locality with regard to instructions because the normal execution of instructions is sequential. We expect spatial locality with regard to data because in programs, especially scientific programs where execution time is critical, much of the execution time is spent in sequential processing of arrays. For example, consider the access to the array A in the following loop.

      sum = 0;
      for (int i = 0; i < 10; i ++) {
          sum = sum + A[i];
      }
      

Recently accessed data and instructions and nearby data and instructions are likely to be accessed in the near future.

A cache level sits between a data requester and a large, slow data source (the lower level). It uses a small but fast local store.

A Cache Level

Cache Operation

The following algorithm is used by a cache level in order to take advantage of both temporal and spatial locality.

      if the requested data is in the local store (a cache hit)
        return the requested data (fast)
      else (a cache miss)
        get the requested and nearby data from the lower level (slow)
        save it in the local store
        return the requested data
      

Cache operation is transparent: the request semantics through the cache is the same as direct access to the lower level.

Cache Benefits

When writes are considered there are two additional questions to be considered in cache design.

Also, the simplest cache designs make it impossible to satisfy a new request until a cache write completes. More modern caches use a small write queue in front of the lower-level store, allowing the cache to proceed with processing hits without waiting for writes to complete.