Hardware based branch prediction is an important CPU technology to improve control hazards. Branch prediction has two aspects: branch condition prediction and branch target prediction. Branch condition decides whether the branch is taken or not, and branch target decides the target address. Both aspects are equally important.
Branch Condition Prediction
Branch condition prediction consists of static prediction and dynamic prediction. Static prediction means branch either always-taken or always-not-taken. In modern CPU technology, static prediction is out of date, so we will focus on dynamic prediction in this post.
The simplest solution is to keep a 1-bit state machine. When the bit is 1, the prediction will be taken, otherwise the prediction will be not taken. If the prediction is incorrect, the bit will be flipped.
To further improve prediction, the state can be kept with 2 bits instead of 1. A commonly used 2-bit implementation is shown below.
When the state is 11 or 10, the prediction will be taken, otherwise it will be not taken. The result can only be changed with 2 failed predictions. This is particularly useful in for-for loop. Using 1-bit predictor, the last iteration of inner loop will always fail, which will not happen with 2-bit predictor.
The predictors we discussed so far only consider “local” branch history, i.e., the previous behavior of the branch itself. The behavior of a conditional branch also depends on the path taken by the program to that branch. This means the behavior of the last few conditional branches or the “global” history also impacts the prediction accuracy. A common solution is to have (m, n) correlation predictor, where m stands for “global” history of m recently happened branches, and n stands for n-bit “local” predictor. A (2, 2) correlating predictor is shown below.
The 2-bit global branch history keeps track of behavior of most recent 2 branches, and it used to index which set of 2-bit predictors to use. Each set of 2-bit predictors is indexed by last 4 bits of branch address, which means each set has 16 2-bit predictors. Final prediction result is derived from both global branch history and branch address.
The “ultimate” solution is called Tournament Predictor. As suggested by the name, two predictors are working simultaneously and competing with each other. A 2-bit predictor is used to choose which predictor provides the result. If the predictor that is chosen has incorrect result twice while the other predicts correctly, the second predictor gets picked for branch prediction. Relatively speaking, Tournament Predictor has better accuracy over other types of predictors.
Branch Target Prediction
Branch target prediction involves getting branch target address as soon as possible. Without branch target prediction, even though we know branch should be taken in advance, we would not know where to jump.
A common implementation is to keep a branch target buffer or BTB, to store the predicted branch target address. The BTB is indexed by branch instruction’s PC or program count, and the predicted target address will be available in one cycle.
Branch prediction can only work well when both condition and target prediction co-exist. When being asked how to improve control hazards in CPU pipelining, remember to mention both condition and target prediction.