The CPU runs programs in virtual address space, and the virtual address needs to translate to physical address before accessing system memory. Virtual memory is an often-asked-question in hardware interviews, and it reflects the interviewee’s overall understanding towards modern computer system.
- Calculate effective address for memory access
- Check addresses of all active load / store and ensure conflicting load / store cannot execute out of order
J. P. Shen and M. H. Lipasti have excellent elaboration on this topic in their book, Modern Processor Design. In this post, we will briefly discuss their idea about load / store processing.
The RAW, WAW and WAR hazards we discussed in previous posts are with respect to registers. For memory accesses, there exists subtle data dependencies that are barely mentioned. These data dependencies include RAW, WAW and WAR as well. Note,
- Memory data dependencies only exist for accesses to the same memory address; for memory accesses to different addresses, load and store can be safely done out of order
- In MIPS 5-stage pipeline, all memory accesses are performed in MEM stage and thus are serialized, so these data dependencies will not happen in MIPS 5-stage pipeline. In this post, we will focus on CPU out-of-order scheduling.
A fundamental restriction in MIPS 5-stage pipeline is, if one instruction stalls, all instructions following it stall as well, even if they have no dependencies at all. If the stalled instruction take a long time to finish, the throughput of the CPU pipeline will suffer.
In modern CPU technology, out-of-order scheduling is an important technique to address the restriction above. Tomasulo’s algorithm is a typical solution to CPU out-of-order scheduling.
Hardware based branch prediction is an important CPU technology to improve control hazards. Branch prediction has two aspects: branch condition prediction and branch target prediction. Branch condition decides whether the branch is taken or not, and branch target decides the target address. Both aspects are equally important.
The CPU pipelining introduces throughput increasing and possible higher clock frequency, but it does not come for free. By allowing multiple instructions being executed in parallel, CPU designers need to take care of the following hazards:
In this post, we will discuss these hazards in detail and use MIPS 5-stage pipeline for case study.
MIPS 5-stage pipeline is a classic way to illustrate CPU pipelining, and it is a common interview questions for new grads and junior engineers. The 5-stage pipeline consists of the following stages:
IF – instruction fetch
ID – instruction decode and operand fetch
EX – instruction execution
MEM – memory access
WB – write back