Finally, a 64K main memory requires 16 bit addresses, so the TAG field contains 16 - 3 - 3 = 10 bits. Thus a main memory address is decomposed as shown below.
| TAG (10) | SET (3) | WORD (3) |
In a system without cache, the sequential block access time is B*MMAT. For cache systems, the sequential block access time depends on whether or not the desired block is in cache; that is, whether there is a cache hit or a cache miss. In the case of a cache hit, the sequential block access time is B*CAT.
To determine the sequential block access time for a cache miss, we need to answer two questions. First, is the main memory interleaved? And second, does the cache support load-through? We will consider all four possibilities. To simplify analysis for memory interleaving, we assume that the number of memory modules is the same as the number of words in a block.
In the simplest case the memory is not interleaved and the cache does not support load-through. Then the sequential block access time is B*MMAT + B*CAT. The B*MMAT term is the time to read the block from main memory into the cache. Since there is no load-through, the CPU must wait until the block is entirely loaded into the cache before it gets any data. The B*CAT term is then the time required to tranfer the data from the cache to the CPU.
Using load-through but no interleaving, the sequential block access time is B*MMAT. The analysis is the same as for no load-through except that data can be transferred to the CPU at the same time that it is loaded into the cache, eliminating the second term.
Using memory interleaving only, the sequential block access time is reduced to MMAT + 2*B*CAT. With memory interleaving, the address can be sent to all memory modules simultaneously, so that they are all ready after one main memory access time. Then multiplexing circuitry is used to send the data from each module to the cache sequentially. Each transfer to the cache requires one cache access time. After the data is in the cache, B*CAT time is required to transfer the data to the CPU.
Using memory interleaving and load-through, the sequential block access time is MMAT + B*CAT. The analysis is the same as for no load-through except that data can be transferred to the CPU at the same time that it is loaded into the cache.
The results of the above analysis are summarized in the table below.
| load-through? | interleaved? | sequential block access time | |
|---|---|---|---|
| no cache | --- | --- | B*MMAT |
| cache hit | --- | --- | B*CAT |
| cache miss | no | no | B*MMAT + B*CAT |
| cache miss | yes | no | B*MMAT |
| cache miss | no | yes | MMAT + 2*B*CAT |
| cache miss | yes | yes | MMAT + B*CAT |
Suppose that a program accesses memory locations 0 through 159 sequentially and then repeats this access pattern four more times. It is straighforward to compute that, without a cache, it will take 800*MMAT = 8000*CAT to do all of the fetches.
With a cache, since each block is accessed sequentially, our previous analysis applies. We need only determine how many times the access hits a block in the cache and how many times it misses. In the example cache B = 8, so memory locations 0 through 159 correspond to main memory blocks 0 through 19. The cache has 8 sets, each with two blocks. Thus main memory blocks 0, 8, and 16 may reside in cache set 0. For this set, there will be cache misses since three main memory blocks are competing for the two cache blocks in the set. A similar problem arises with sets 1, 2, and 3. In fact, using the LRU replacement algorithm, every block access involves a cache miss. Since there are 4*3*5 = 60 accesses from sets 0 through 3, the total time for these sets is 60*88*CAT = 5280*CAT.
However, for sets 4, 5, 6, and 7, there are only two main memory blocks in the access patern that map to those sets. Thus there are no cache misses except for the initial one for each main memory block. For example, set 4 is used for main memory blocks 4 and 12. For each of these blocks, there will be one access involving a cache miss (time 88*CAT) and 4 accesses without a miss (time 8*CAT each) for a total time of 120*CAT. Since there are 8 main memory blocks that map into sets 4 through 7, the total time for these sets is 8*120*CAT = 960*CAT.
Adding the time for sets 0 through 3, we arrive at a grand total of 6240*CAT.