CS 2521 Study Questions: Final Exam

Fall 2009

Revisions:

12/16/09; 8:44am; Edit of typo in question 20.
12/16/09; 9:08am; Added the term "set" to associative in question 27. Also added KB=1024 to this question.

Material covered:

The final exam is cumulative. Also make sure to study the earlier reviews (review 1 and review 2) in addition to the readings, and programming materials.

Chapter 4.6-4.9: Pipelining
Chapter 5.1 through to the last section in Chapter 5 covered in class on December 14.

Homework questions due: Wednesday, 16 December 2009 [15 pts]:

Homework is due at the start of class on the due date. Late homeworks will not be accepted. Homeworks must be type-written, and not hand-written. Keep a copy of your homework, and bring this with you the day of the quiz/exam review (i.e., the day the homework is due). This should help you in the review session.

Questions 3, 4, 5, 6, 8, 11, 14, 19, 20, 22, 25, 26, 27, 28, 38 (1 point each).


    Chapter 4: The Processor

  1. In Figure 4.10 (p. 314), if we have all zeros on the five lines coming into "Read register 1", what does this mean? When would this occur in instruction processing?
  2. Again with Figure 4.10 (p. 314), how do you cause the register file to write the value 0xf to register $s0? What specific inputs do you need to give to the register file?
  3. How can a compiler optimize assembly code to make use of CPU pipelining? Give at least two specific examples.
  4. A pipeline improves instruction throughput in a CPU (relative to a single cycle design). Yes or No. Circle one. Explain.
  5. Additional registers need to be introduced in a pipelined CPU design (e.g., relative to a single cycle CPU design). What is the purpose of these registers? Please be as specific as you can in your answer. Can these registers be directly accessed in assembly language code?
  6. Generally, in a pipelined CPU, how often does the PC get incremented?
  7. True or False: All functional units are operating (or can be operating) at each time step in a pipelined CPU. Explain your answer.
  8. In general, without consideration of hazards, how many clock cycles (on average) does it takes to complete an additional CPU instruction with a pipelined CPU design?
  9. Give an example of a data hazard in a pipelined CPU. Explain your example. How can data hazards be resolved in the CPU hardware?
  10. What is a load-use hazard? How can data hazards be resolved in the CPU hardware?
  11. Give an assembly language example of a load/use hazard. Give another example (of an lw instruction) without a load/use hazard. [[Added "(of a lw instruction)" on 12/15/09 at 3:28pm]].
  12. Data hazards can occur across R-type instructions. How do load/use hazards differ from data hazards across R-type instructions?
  13. What is a stall in a pipelined CPU?
  14. In terms of the general type of diagram given in 4.36 (p. 349), explain how an "add R1, R2, R3" instruction is processed by the pipeline. Give as much detail as you can regarding the processing of this instruction at each stage of the pipeline.
  15. What is a branch delay slot?
  16. In the following code, first, assume that it operates with no branch delay slot. Now, suppose that a branch delay slot is available. Show how the the code can be re-ordered to make use of the branch delay slot, but preserve meaning. Explain your work.
        sll $s1, $s2, 3
        beq $t0, $t1, L1
        andi $t2, $t3, 0xff
    L1: add $t4, $t5, $t6
  17. Compare the average instruction time performance for our single cycle and pipelined CPU designs. Use the instruction mix: 30% loads; 15% stores; 10% branches; 5% jumps; 40% ALU instructions. For pipelined execution, assume: (a) half of the load instructions have a load/use hazard, (b) branch delay on misprediction is 1 clock cycle, and ¼ of branches are mispredicted, (c) jumps always pay 1 full clock cycle delay (giving them an average time of 2 clock cycles), (d) you can ignore other hazards. For the single cycle processor, use a clock cycle of 500 ps. For the pipelined processor, use a clock cycle of 100 ps.
  18. Generally, how are exceptions handled with a pipelined CPU?
  19. You see that your assembly language program is not working and the problem appears to be due to a conditional branch. Do you think the problem is arising because of a branch delay slot after that conditional branch? Explain. How can you ensure that the problem is not caused by the branch delay slot?
  20. Consider Figure 4.46 (p. 359) of the texbook. Relative to the start of processing the conditional branch instruction, at what clock cycle is the result of "beq" processing available? That is, at what clock cycle has the beq branch decision been made and all information is available to take the branch if needed? Explain. If the branch is "taken", at what clock cycle is the instruction to actually be processed immediately after the beq instruction, fetched from RAM. Explain.
  21. Does Amdahl's law suggest that the extra optimization involved in branch prediction techniques is really worthwhile? Explain.
  22. In the top of Figure 4.36 (p. 349), in the Instruction Fetch (IF) stage, the incremented program counter (PC) is placed into the pipeline register. Why?
  23. Chapter 5: Memory hierarchy & cache

  24. How do loops and other forms of iteration relate to spatial locality?
  25. How does a cache take advantage of locality?
  26. What would happen if a pipelined CPU had no RAM cache, but had to access RAM directly (i.e., without a cache) for every RAM access? Give a general estimate of the CPI for such a pipelined processor assuming no hazards, and making the same kinds of assumptions we did in class for instruction processing rates in pipelined processors, and for the amount of time it takes to access main RAM.
  27. Consider a direct-mapped cache where 16 bits of RAM memory address are used to index into the cache, and RAM memory addresses are 64 bits. And, the block size is 16 bytes. How many entries (rows) should the cache have? What specific bits of the address are used to form the cache index? What specific bits of the address are used for the “tag”?
  28. A computer uses 32-bit addressing, where each address references a byte of RAM. The computer uses a 4-way set associative cache with a capacity of 128KB (KB=1024 bytes). Each cache block contains 16 bytes. Calculate the number of bits in the TAG, SET, and OFFSET fields of a main memory address.
  29. A cache has a local store (higher level) with an access time of 5ns. The lower level store has an access time of 50ns. If the miss rate is 5% what is the effective access time?
  30. A virtual memory system has single-level page tables. The processor uses 32-bit virtual address. It has 16KB pages and 4B page table entries. What is the size of a page table?
  31. Exercise 5.3, p. 550.
  32. Does a direct mapped cache represent bits per entry for the index? Yes or No. Explain.
  33. What is the miss rate for a cache write done using a write through strategy? Explain.
  34. Following on the last question, would the miss rate be the same for cache reads and writes? Yes or No. Eplain.
  35. What is a block address for RAM in primary memory? Carefully distinguish between a primary RAM block address and a cache index.
  36. Given that a program generates the following sequence of RAM accesses (given as block addresses), does the program exhibit locality of reference? Explain your answer. Here is the sequence of RAM accesses: 3, 3, 3, 1, 1, 1, 0, 0
  37. Show a sequence of RAM accesses (as block addresses) that does not exhibit locality of reference.
  38. What does a pipelined CPU do differently when it encounters a RAM cache hit versus a RAM cache miss? Make sure your answer describes the process of handling a hit and the process of handling a miss.
  39. Lab related questions

  40. For software interrupts such as "break", we need to increment the EPC by four, but for hardware interrupts, you should not increment the EPC. Why?