Caching

Caching is a very general technique for improving computer system performance. Based on the principle of locality of reference, it is used in a computer's primary storage hierarchy, its operating system, networks, and databases. Caching improves access time and reduces data traffic to data sources that have limited throughput.

For most of this web page only reads are considered. A section at the end deals with issues that arise when writes are considered.

Examples

In addition, caching can be used in dynamic programming algorithms for computed data, and in just-in-time compilation for machine code translations of functions or methods.

Primary Storage Hierarchies

Modern computer primary storage hierarchies use multiple levels of caching in order to bridge the gulf between processor speeds and main memory and disc speeds. Older desktops and laptops systems generally had two cache levels. Servers and recent desktops and laptops have had three cache levels.

Older Primary Storage Hierarchies

Level Controller Storage Technology
L1 Cache on-chip on-chip SRAM
L2 Cache motherboard motherboard SRAM
Main Memory motherboard motherboard DRAM
Virtual Memory operating system disk magnetic

Recent Primary Storage Hierarchies

Level Controller Storage Technology
L1 Cache on-chip on-chip SRAM
L2 Cache on-chip on-chip SRAM
L3 Cache on-chip motherboard SRAM
Main Memory motherboard motherboard DRAM
Virtual Memory operating system disk magnetic

Locality of Reference

The principles of locality are facts about data and instruction accesses that arise from patterns that we frequently use for storing and accessing information. It has two aspects: temporal locality and spatial locality.

The Principle of Temporal Locality

Recently accessed data and instructions are likely to be accessed in the near future.

We expect temporal locality with regard to instructions because a large part of execution time is spent in repetitive code such as loops and recursions. We expect temporal locality with regard to data for a variety of reasons. One example is accessing control and summary variables inside loops such as i and sum in the following loop.

      sum = 0;
      for (int i = 0; i < 10; i++) {
          sum = sum + A[i];
      }
      

The Principle of Spatial Locality

Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future.

The phrase "close to" has different meanings in different contexts. In the case of primary memory caches, "close to" refers to the proximity of memory addresses. For disk caching, "close to" refers to proximity of data location on a disk.

We expect spatial locality with regard to instructions because the normal execution of instructions is sequential. We expect spatial locality with regard to data because in programs, especially scientific programs where execution time is critical, much of the execution time is spent in sequential processing of arrays. For example, consider the access to the array A in the following loop.

      sum = 0;
      for (int i = 0; i < 10; i++) {
          sum = sum + A[i];
      }
      

The Combined Principle of Locality

Recently accessed data and instructions and nearby data and instructions are likely to be accessed in the near future.

Cache Organization

A cache level sits between a data requester and a large, slow data source (the lower level). It uses a small but fast local store.

Cache Operation

The following algorithm is used by a cache level in order to take advantage of both temporal and spatial locality.

      if the requested data is in the local store (a cache hit)
        return the requested data (fast)
      else (a cache miss)
        get the requested and nearby data from the lower level (slow)
        save it in the local store
        return the requested data

Cache operation is transparent: the request semantics through the cache is the same as direct access to the lower level.

Cache Benefits

  • A cache reduces the effective access time to data in the lower-level store. If most of the data accesses are cache hits then the effective access time is close to the access time of the cache's local store.
  • A cache reduces the traffic to the lower-level store. Cache hits are not passed on to the lower-level store. This is important for dealing with throughput limitations of the lower-level store.

Cache Writes

When writes are considered there are two additional questions to be considered in cache design.

  • On a write miss, should the cache allocate a cache line?

    Most modern caches do allocate a cache line.

  • On a write hit, should the cache immediately update the lower level (write-through) or just mark the line in the local store as "dirty" and update the lower level when the line is replaced (write-back)?

    Write-through is simpler to implement, but write-back reduces lower-level traffic. Over the years, the trend has been towards more use of write-back. This is especially true with modern multi-core processors.

Also, the simplest cache designs make it impossible to satisfy a new request until a cache write completes. More modern caches use a small write queue in front of the lower-level store, allowing the cache to proceed with processing hits without waiting for writes to complete.