Register renaming is a form of pipelining that deals with data dependences between instructions by renaming their register operands. An assembly language programmer or a compiler specifies these operands using architectural registers - the registers that are explicit in the instruction set architecture. Renaming replaces architectural register names by, in effect, value names, with a new value name for each instruction destination operand. This eliminates the name dependences (output dependences and antidependences) between instructions and automatically recognizes true dependences.

The recognition of true data dependences between instructions permits a more flexible life cycle for instructions. By maintaining a status bit for each value indicating whether or not it has been computed yet, it allows the execution phase of two instruction operations to be performed out of order when there are no true data dependences between them. This is called out-of-order execution.

After looking at the process of renaming operands we will look at the life cycle of an instruction in a register renaming architecture. Then we will look at a generic hardware organization for it and some possible performance enhancements. Finally, we will look at a brief history of the register renaming concept.

We can understand data dependences by being more explicit about the action specified by an instruction. The action specified by an instruction is clearer if we describe instructions in terms of values rather than registers. We need to name the values in a way that captures changes in register contents over time.

We can capture the intent of a sequence of instructions by replacing register names in all operands with value names. This is accomplished with the following algorithm.

  1. Replace each source operand by the most recent value name in the designated register column.

  2. Replace the destination operand by a new name and place the new name in the designated register column.

It is important that step 1 is done first. Otherwise, when the same register is used both as a source operand and a destination operand we are indicating that the instruction execution phase cannot begin until its result is ready. This makes it impossible for the execution phase to begin.

We will start with the following instructions.

      mul.d  $f6, $f0, $f2
      div.d  $f4, $f2, $f0
      add.d  $f0, $f6, $f2
      
Instruction $f0 $f2 $f4 $f6 Renamed Instruction
initial values V0 V1 V2 V3
mul.d $f6, $f0, $f2 V4 mul.d V4, V0, V1
div.d $f4, $f2, $f0 V5 div.d V5, V1, V0
add.d $f0, $f6, $f2 V6 add.d V6, V4, V1

It can be verified that we have correctly renamed these instructions but first we add the following instruction and examine the renaming process in detail.

      sub.d  $f2, $f2, $f6
      
  1. Replace each source operand by the most recent value name in the designated register column.

  2. Replace the destination operand by a new name and place the new name in the designated register column.

First replace $f2 by the most recent value name in the $f2 column. The table below shows the renamed operand in light blue. The values that it depends on are shown in light green.

Instruction $f0 $f2 $f4 $f6 Renamed Instruction
initial values V0 V1 V2 V3
mul.d $f6, $f0, $f2 V4 mul.d V4, V0, V1
div.d $f4, $f2, $f0 V5 div.d V5, V1, V0
add.d $f0, $f6, $f2 V6 add.d V6, V4, V1
sub.d $f2, $f2, $f6 sub.d ??, V1, ??

Then replace $f6 by the most recent value name in the $f6 column.

Instruction $f0 $f2 $f4 $f6 Renamed Instruction
initial values V0 V1 V2 V3
mul.d $f6, $f0, $f2 V4 mul.d V4, V0, V1
div.d $f4, $f2, $f0 V5 div.d V5, V1, V0
add.d $f0, $f6, $f2 V6 add.d V6, V4, V1
sub.d $f2, $f2, $f6 sub.d ??, V1, V4
  1. Replace each source operand by the most recent value name in the designated register column.

  2. Replace the destination operand by a new name and place the new name in the designated register column.

Replace the $f2 destination operand by a new value name V7 and also put V7 into the column for $f2.

Instruction $f0 $f2 $f4 $f6 Renamed Instruction
initial values V0 V1 V2 V3
mul.d $f6, $f0, $f2 V4 mul.d V4, V0, V1
div.d $f4, $f2, $f0 V5 div.d V5, V1, V0
add.d $f0, $f6, $f2 V6 add.d V6, V4, V1
sub.d $f2, $f2, $f6 V7 sub.d V7, V1, V4

Note that the instructions with value names capture the intent of a sequence of instructions by specifying relationships between register values, rather than just registers. This simplies the determination of when we can begin the execution of an instruction can begin. We only need to check if its source operand values have been computed. The name dependences no longer complicate the picture.

An instruction goes through 6 stages in its execution.

The instruction is fetched from memory or an instruction cache and the program counter is incremented.

When the instruction is issued it is assigned a rename register number, which identifies the instruction and is used as a destination value name. Source operands architectural register numbers are also translated to value names. Issue circuitry forms a recoded instruction which contains a code for the operation, the assigned rename register number, and information about each source operand. This information includes the status of the rename register for the operand and either its value or its value name, depending on the status. From the time they are issued until the time they are committed, instructions are considered to be "in flight". The destination rename registers for in flight instructions are linked together in issue order to form an "in flight" list in the values circuitry.

If both source operands in the recoded instruction have "ready" status, and a functional unit is available to perform the operation, then the recoded instruction can be sent immediately to the functional unit. Otherwise, it is assigned to a reservation slot to wait for source operands to become available and/or for a suitable functional unit to become available.

When all source operands are ready and a suitable functional unit is available then execution begins. The functional unit receives an operation code, source values, and the destination value name.

The result is a destination value and the destination value name is identifies the value register in which it will be placed. During the last cycle of execution they are sent to the value registers, where the value will be written to the appropriate value register one cycle later (write back).

If register forwarding is implemented the value and value name are also sent to reservation stations. A reservation station that is waiting for that value name can then capture the value. If other source operands are ready and a functional unit is available it instruction can start execution at the beginning of the next cycle. This is one cycle earlier than waiting for the value register to have a valid value.

The destination operand value and its name to both the rename register in the values circuitry and to the reservation stations. The status bit for the named value register is updated to indicate that the value is ready.

If register forwarding is implemented the value is also distributed to the reservation stations where it is captured by any source register that is waiting for that value name.

The instruction begins commitment when two conditions are met:

At that time, the rename register that was previously assigned to the instruction's destination architectural register can be freed for reuse by another instruction.

The following block diagram and block descriptions are one of several possible implementations of register renaming. The description is not intended to represent how any particular processor implements register renaming. It is mainly intended to make implementation believable. For more information about design options see [Sima00].

renaming

renaming

Instruction fetch circuitry fetches instructions from memory or an instruction cache, incrementing the program counter after each fetch. Instruction fetch circuitry often also has a branch history table for predicting whether or not conditional branches will be taken and a branch target buffer for quick lookup of branch targets.

renaming

Issue circuitry renames instruction operands, sending the recoded instruction to the reservation stations. It uses an issue table - a small set of registers, one for each architectural register. This table is essentially hardware for the register columns in the earlier textual renaming tables. The destination register values in the table are value register numbers obtained from the values circuitry.

In some variants of the renaming architecture, issue circuitry may also fetch source values if they are available.

renaming

Reservation circuitry holds recoded instructions until they are ready to be executed. An instruction is ready when all of its source operands are valid and there is a suitable free functional unit to perform the operation.

renaming

Execution circuitry performs operations required for instructions. Execution circuitry is usually grouped into functional units, each performing a group of operations that involve shared circuity. Most processors, for example, use a load-store functional unit for all memory load and store instructions. For high performance, the functional units are often pipelined to permit starting the execution phase of a new instruction every cycle.

renaming

Values circuitry manages a numbered set of rename registers. The register numbers are used by issue circuitry for renaming operands. In addition to its value, a rename register has a status bit that indicates whether or not its value is valid, a link field for building lists, and a field indicating which architectural register it represents.

Values circuitry uses the link field to build to lists.

Values circuitry also has small registers for keeping track of the start and end of these lists. The start of the free list is a value register number that can be passed to the issue circuitry for naming new values. The start of the in-flight list is the register number for the value register that is to be committed next. When necessary, value registers can be added to the end of these lists.

renaming

Commit circuitry frees rename registers that are no longer needed. A rename register is no longer needed when its architectural register gets assigned a new value. Commit circuitry can be considered as a part of the values circuitry since it has access to the free and in-flight lists. In addition to accessing the free and in-flight lists, commit circuitry maintains a commit table that maps architectural registers to the value registers that contain their most recently committed values. This table is similar to the issue table in the issue circuitry.

Commit circuitry works with the value register at the head of the in-flight list. The "architectural register" field of that register is used to select an entry from the commit table. This entry specifies the number of the value register that was committed earlier to the architectural register. That value is no longer needed. Its register is moved into the free list. Meanwhile the value register at the head head of the in-flight list is removed from the in-flight list and its value register number replaces the old entry in the commit table.

The operation of commit circuitry can be improved by maintaing separate lists for each architectural register. That allows multiple instructions to be committed per cycle. However, it only works correctly if reservation stations capture source operand values as soon as they are ready. The importance of multiple commits per cycle is brought out in Register Renaming Performance.

Modern processors that have a renaming architecture have an organization similar to that described in the previous sections. There are some important changes for improving performance.

Speculative execution involves predicting whether or not conditional branches will be taken and speculative issue of either the instruction at the branch target or the instruction following the branch, depending on the prediction. With speculative execution CPI can be reduced to nearly 1.0.

Speculative execution requires adding circuitry to the instruction fetch circuitry for keeping track of past history of branches. Usually a cache for recent target addresses is added to reduce the time spent on computing branch target addresses. Speculative execution also requires small modifications elsewhere to allow cancelation of instructions that were issued as a result of incorrect predictions.

Superscalar execution involves issuing batch of instructions in each cycle. For the most part, this uses the same basic architectural components, with widening of paths from the instruction cache to support more instruction fetches and some addition to the issue circuitry to support parallel issue. Modern processors are currently capable of a CPI of issuing in batches of 4 or 5 instructions each cycle. It is difficult to increase the size of an issue batch beyond this point due to the inherent sequential nature of renaming.

Processors usually cannot issue a full batch of instructions on all cycles. Most implementations will not make any attempt to include instructions after a branch in the same issue batch. Thus the average number of issues per cycle is about 2.5, giving a CPI of about 0.4.

Modern operating systems support time sharing between multiple threads, inreasing the utilization of the processor. For example, a web server will typically have one thread that just listens for new connection requests. After establishing a connection, that thread passes it to a worker thread. Each worker thread handles only one connection at a time. If one of the worker threads is blocked, for example waiting for information from the file system, other threads can get time on the processor.

Recent processors have added hardware support for multiple threads. The most important hardware change is replication of the issue table - each hardware thread needs it own table. Any hardware fields that record architectural register numbers also have to be expanded to allow including a thread identifier.

Multithreaded execution can be used to simplify and improve handling of branches. Suppose you had a processor that could issue 4 instructions per cycle and supported 6 threads. Then it has only 4 of the threads active at a time. If a branch instruction is fetched for one of the threads it can be issued, but then the thread made inactive until the branch condition is resolved. That allows the processor to keep issuing 4 instructions per cycle except when at least 3 threads are inactive due to branches.

It is not clear how useful this idea is. Currently, processor implementation is up against a power wall. Increasing the amount of work that gets done per cycle just increases the power consumption and the heat that results.

The first renaming architecture was used for floating point instructions in late high performance models of the IBM 360 family of of computers [Tomasulo]. At that time it was not called register renaming. Tomasulo's architecture did not have value registers. Instead it used reservation station numbers for naming values.

The term "register renaming" was used by compiler writers for software renaming. The term was later adopted by computer engineers.

Register renaming was used in many processors starting in the mid 1990s. Today it is used in almost every desktop and server processor.