Pipelining Obstacles

Pipelining is a powerful technique for improving the performance of processors. Pipelining obstacles are complications arising from the fact that instructions in a pipeline are not independent of each other. In the past, these problems have been attacked by both computer architects and compiler writers. This has led to two different kinds of terminology. This web presentation mostly uses the hazard terminology of the computer architect.

There are three types of pipeline hazards:

Structural hazards occur when two instructions in a pipeline need the same hardware resource at the same time.
Control hazards occur when conditional branches interfere with instruction fetches in a pipeline.
Data hazards occur when two instructions in a pipeline refer to the same register and at least one of them writes to the register.

The control and data problems have been addressed by both computer architects and compiler writers. They use different terminology to address the same kinds of problems from different perspectives.

Computer architects are typically in a context where they are modifying an existing implementation to produce an improved implementation. They use the term hazard to refer to a pipelining obstacle in the existing hardware implementation on a particular sequence of instructions. A hazard indicates something that needs to be fixed.

Compiler writers try to optimize code without knowing the details of the hardware. They use the term dependence to refer to a relationship between instructions. Depending on the hardware, these dependences could potentially result in hazards.

Here is a comparison of their terminology.

Computer Architects		Compiler Writers
structural hazards		no equivalent
control hazards		control dependences
data hazards	RAW hazards	true dependences
	WAR hazards	name dependences	antidependences
	WAW hazards	name dependences	output dependences

Structural hazards occur when two instructions in a pipeline need the same hardware resource at the same time. Structural hazards can be avoided by stalling, duplicating the resource, or pipelining the resource.

For example, suppose the processor only has a single port to memory used for both data and instructions. Then there is a structural hazard between the MEM phase of a load or store instruction and the IF phase of the instruction that needs to be fetched at that time. This hazard can be avoided by either stalling the instruction fetch or by having two memory ports. Most modern processors have separate data and instruction caches which, in effect, gives them two memory ports.

In addition, real processors have some long latency instructions - instructions whose execution step cannot be completed in a single cycle. One example is an integer multiply. With unmodified multiply circuitry, you cannot handle two successive multiply instructions without stalling the second.

To deal with long latency instructions, the execute circuitry is generally divided up into functional units, each handling a small number of similar instructions. These functional units can be pipelined by adding pipeline registers. This lets you start long latency instructions every cycle.

Control Hazards occur when conditional branches interfere with instruction fetches in a pipeline. The problem is that it is not known whether or not a conditional branch will be taken until some time after the cycle for fetching the next instruction. Also, the branch target address needs to be computed if the branch is taken.

A control hazard could be handled by stalling the next instruction fetch. However that has a significant impact on performance, especially in tight loops, where many programs spend much of their time. A common technique for reducing the stalls associated with control hazards is speculative execution - guess whether or not the branch will be taken and fetch the next instruction based on the guess. To do this, the machine needs two tables containing information about recent branches:

A branch history table records bits about recent branch history, that is, whether or not a branch was taken. The processor uses these bits to guess whether or not a branch will be taken.
A branch target table holds target addresses for recent branches. This table reduces the time needed to determine the branch target address.

Speculative execution also requires a mechanism for backing out of instructions executed based on incorrect guesses and resuming execution of the correct instruction sequence.

Data hazards occur when two instructions in a pipeline refer to the same register and at least one of them writes to the register. Compiler writers use the phrase "data dependences" to cover the same kind of problem, but their terminology refers to what you can see in an instruction stream without considering the pipeline.

Also, execution circuitry is usually broken up into multiple functional units, each performing different types of operations. These functional units can be performing operations in parallel. More complex operations may take several cycles to complete. To illustrate the difficulties that result, consider the following MIPS code snippet.

        div.d   $f0, $f2, $f4
        mul.d   $f6, $f8, $f0
        add.d   $f0, $f10, $f12

The use of register $f0 can give rise to three different kinds of problems in this code.

Read after Write (RAW) hazards, also known as true dependences
Write after Write (WAW) hazards, also known as output dependences
Write after Read (WAR) hazards, also known as antidependences

The naming of these hazards is based on what is supposed to happen. That is, a WAR hazard occurs when an instruction that writes to a register follows soon after an instruction that reads from the same register. If the write precedes the read then the first instruction (the read) is working with the wrong data value.

A true dependence arises when one instruction computes a value that is used by a later instruction. More precisely, the same operand register is a destination operand in the earlier instruction and a source operand in the later instruction, and there are no instructions between them that write a different value to the register.

If the two instructions can be in the pipeline at the same time and there is a possibility that the value will not be ready when the second instruction reads its source operands the condition is called a read after write (RAW) hazard.

For example, consider the following code.

        mul.d   $f0, $f2, $f4
        add.d   $f6, $f8, $f0

Here, the value produced by the mul.d instruction in $f0 is used as a source operand by the add.d instruction. Even if the execution phase of the mul.d instruction takes a single cycle, there is a hazard because the value is produced late but it is needed early as a source operand. Suppose the mul.d instruction takes 5 cycles for its EX stage and the add.d takes 3 cycles. The following chart indicates the timing of the two instructions with the RAW hazard ignored.


mul.d	IF	ID	EX			MEM	WB
add.d		IF	ID	EX	MEM	WB

In order to handle the RAW hazard correctly, the ID stage of the add.d instruction should be stalled until $f0 has the value written by the mul.d instruction.

mul.d

MEM

add.d

stalled

MEM

Register forwarding is a technique for faster handling of RAW hazards. It involves adding and controling direct data paths from functional unit outputs to functional unit inputs. With register forwarding, the ID stage of the add.d instruction can be started during the last cycle of the EX stage of the mul.d instruction, as shown below.

mul.d

MEM

add.d

stalled

MEM

An output dependence arises when an earlier instruction writes a result to the same place that a later instruction writes to. More precisely, the same operand register is a destination operand in both the earlier instruction and the later instruction, and there are no instructions between them that write a different value to the register.

If the two instructions can be in the pipeline at the same time and there is a possibility that the second instruction will write its result before the first instruction writes its result then the condition is called a write after write (WAW) hazard.

For example, consider the following code.

        mul.d   $f0, $f2, $f4
        add.d   $f0, $f10, $f12

Here, the mul.d instruction and the add.d instructions will both try to write to $f0. Suppose the mul.d instruction takes 5 cycles for its EX stage and the add.d takes 3 cycles. The following chart indicates the timing of the two instructions.


mul.d	IF	ID	EX			MEM	WB
add.d		IF	ID	EX	MEM	WB

This will result in later instructions seeing the wrong value in $f0 - the result from the mul.d instruction rather than the result of the add.d instruction. To remedy this problem, the WB stage of the add.d instruction should be stalled until after the WB stage of the mul.d instruction, as shown below.


mul.d	IF	ID	EX			MEM	WB
add.d		IF	ID	EX	MEM	stalled		WB

An antidependence arises when an earlier instruction reads a value from the same place that will be written by a later instruction. That is, the same operand register is a source operand in the earlier instruction and a destination operand in the later instruction, and there are no instructions between them that write a different value to the register.

If the two instructions can be in the pipeline at the same time and there is a possibility that the later instruction will write its result before the earlier instruction has read its source operand value then the condition is called a write after read (WAR) hazard.

For example, consider the following code.

        div.d   $f2, $f4, $f6
        add.d   $f8, $f2, $f0
        sub.d   $f0, $f10, $f12

Here, the value in $f0 is a source operand for the add.d instruction and a destination operand for the sub.d instruction. The add.d instruction executes incorrectly if it reads $f0 after the sub.d instruction has written to $f0. The following chart seems to indicate that this cannot happen. It assumes that the EX stage for a div.d instruction takes 8 cycles and the EX stage for a add.d or a sub.d instruction takes 3 cycles.

div.d

MEM

add.d

MEM

sub.d

MEM

However, the WAR hazard arises when the register read for the add.d instruction is delayed due to its RAW hazard with the div.d instruction regarding $f2. The following chart shows the real situation, assuming that register forwarding is used.

div.d

MEM

add.d

stalled

MEM

sub.d

MEM

Now the write from the sub.d instruction changes the value in $f0 before the add.d has had a chance to read the original value. The problem arises here because the add.d has a source operand, $f2, that must be read late due to a RAW hazard, while another source operand, $f0, should be read early to avoid a WAR hazard. If the two reads can be done at different times then the WAR hazard disappears.