The CPU pipelining introduces throughput increasing and possible higher clock frequency, but it does not come for free. By allowing multiple instructions being executed in parallel, CPU designers need to take care of the following hazards:
In this post, we will discuss these hazards in detail and use MIPS 5-stage pipeline for case study.
Structural hazard happens due to limited hardware resources. Multiple instructions may try to access the same sets of resources, causing conflict issues. These resources include register file and memory.
In the same cycle, one instruction may be in ID stage which requires register file read access, while the other may be in WB stage which requires register file write access. If register file only has one port, then a structural hazards can occur. This can be resolved by allocating first half of the cycle for read access and the second half for write access. A more common solution is to have separate port for read and write access in register file.
In the same cycle, one instruction may be in IF stage which requires memory read access for instruction fetch, while the other may be in MEM stage which requires data load or store. This can be resolved by separating instruction memory and data memory. In modern computer architecture, this is sort of given since CPU has one L1 instruction cache and one L1 data cache.
Data hazard happens when trying to access data that is not yet available. There are in total of 3 types of data hazards:
Read After Write / RAW
Write After Read / WAR
Write After Write / WAW
Among these data hazards, only RAW indicates true data dependency, the other two are merely name dependencies. In MIPS 5-stage pipeline, write or WB stage is always after read or ID stage, thus WAR and WAW will never happen in this case. WAR and WAW will be covered in CPU out-of-order scheduling.
RAW hazard can happen in the following program:
R3 <- R1 + R2
R5 <- R3 + R4
The result of the first instruction has not yet been written to register file, but the instruction following it attempts to use the result right away. If the second instruction tries to get R3 value from register file, its result will be incorrect.
A technique called “Forwarding” can help in this case. Instead of getting R3 value from register file, ALU can directly feed its output to its input, so that the second instruction can use updated R3 value immediately.
One thing to remember is, “Forwarding” cannot resolve all RAW hazards. For example:
R3 <- MEM[addr]
R5 <- R3 + R4
R3 will not be available until the end of MEM stage, thus feeding ALU output directly to its input will not help the correct execution of the second instruction. The second instruction needs to stall for one cycle, in order for R3 being forwarded from MEM stage.
Control hazard happens when attempting to make decisions for branch before condition calculation is done. There are couple of ways to deal with branch and control hazards.
First of all, we can stall the instruction following branch instruction until the condition is determined. That means the next instruction cannot be issued until branch reaches MEM stage, and a 2-cycle stall is introduced.
Secondly, if we can get branch resolved in ID stage instead of EX stage, we will only lose 1 cycle. This means ID stage requires additional ALU and completes condition test in advance, to avoid the structural hazards between ID stage and EX stage. However, this approach may lead to new RAW hazard. See the example below:
R3 <- R1 + R2
The branch will use R3 result immediately following preceding instruction, and R3 will not be available until the end of EX stage, thus 1-cycle stall before the branch cannot be avoided.
Thirdly, to fill in the stalls after branch instruction until it is resolved, a technique called “Delay Slot” can be used. The instructions in “Delay Slot” are always executed, but this requires a more sophisticated compiler technology. Compiler needs to understand the program contexts in order to fill in “Delay Slot” without breaking intentions of programmers.
Lastly, hardware based prediction can be used for speculative instruction execution. However, this requires CPU to have “self restoration” capability in case of incorrect prediction.
Obviously, the most straightforward solution to resolving hazards is to wait, but this is not preferred. Adding stalls can lead to performance degradation, which defeats the purpose of CPU pipelining in the first place.
Several techniques are covered in this post to address each type of the hazards above. More advanced techniques will be discussed in CPU out-of-order scheduling.