How to implement hardware-based speculation in Tomasulo’s Algorithm to minimize control hazards?

We have covered how Tomasulo’s algorithm minimize performance impact of data hazards, we can also improve on control hazards. The key idea is to execute instructions based on branch prediction / speculation. Now the concern is, how to restore the program if the speculation is incorrect.

Continue reading → How to implement hardware-based speculation in Tomasulo’s Algorithm to minimize control hazards?

How to handle data dependencies through memory?

We covered different types of data dependencies through memory in previous post. To handle data memory dependencies in CPU out-of-order scheduling, there are usually two steps involved:

  1. Calculate effective address for memory access
  2. Check addresses of all active load / store and ensure conflicting load / store cannot execute out of order

J. P. Shen and M. H. Lipasti have excellent elaboration on this topic in their book, Modern Processor Design. In this post, we will briefly discuss their idea about load / store processing.

Continue reading → How to handle data dependencies through memory?

What are data dependencies through memory in Tomasulo’s Algorithm?

The RAW, WAW and WAR hazards we discussed in previous posts are with respect to registers. For memory accesses, there exists subtle data dependencies that are barely mentioned. These data dependencies include RAW, WAW and WAR as well. Note,

  1. Memory data dependencies only exist for accesses to the same memory address; for memory accesses to different addresses, load and store can be safely done out of order
  2. In MIPS 5-stage pipeline, all memory accesses are performed in MEM stage and thus are serialized, so these data dependencies will not happen in MIPS 5-stage pipeline. In this post, we will focus on CPU out-of-order scheduling.

Continue reading → What are data dependencies through memory in Tomasulo’s Algorithm?

How does Tomasulo’s Algorithm work?

A fundamental restriction in MIPS 5-stage pipeline is, if one instruction stalls, all instructions following it stall as well, even if they have no dependencies at all. If the stalled instruction take a long time to finish, the throughput of the CPU pipeline will suffer.

In modern CPU technology, out-of-order scheduling is an important technique to address the restriction above. Tomasulo’s algorithm is a typical solution to CPU out-of-order scheduling.

Continue reading → How does Tomasulo’s Algorithm work?