## How Computers Work?

What if you got ask the very basic question: How Computers Work?

A brief answer:

A more comprehensive answer:

Skip to content
## Category: High Frequency Trading (FPGA Engineer)

##
How Computers Work?

##
What are the differences between sync and async reset?

##
How to implement pseudo LRU?

##
How to implement true LRU? (II)

**Linked List Implementation**

##
How to implement true LRU? (I)

##
Design A Programmable Sequence Detector

##
Determine Whether An Infinite Sequence Is A Multiple of 5

What if you got ask the very basic question: How Computers Work?

A brief answer:

A more comprehensive answer:

Synchronous reset has the following characteristics:

- It will not work if clock path has failures
- It tends to immune to glitches, since sync reset requires clock toggling
- It makes STA simpler and the whole design synchronous, since sync reset has no difference from regular data path
- Sync reset implementation will add a MUX at the input of each flop

Continue reading → What are the differences between sync and async reset?

We discussed true LRU implementations, but they are usually costly. In this post, we will explain how to implement pseudo LRU.

An example is shown below. The data structure is similar to a binary tree. For N ways in a cache set, we need to keep (N – 1) bits. If there are 8 sets in a way, then we will need 7 bits.

Upon a cache access, all tree nodes point to that cache way will be flipped. For example, if set 3 is accessed, then h0, h1 and h4 will be flipped in next cycle.

To find LRU, we can perform a depth-first-search starting from h0, and traverse nodes in lower levels. If the node is 0, then we traverse the left sub-tree; otherwise, we traverse the right sub-tree. In the diagram above, the LRU is set 3.

Once we understand the basic idea of pseudo LRU, it is easy to write code for pseudo LRU.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
module lru_pseudo ( input clk, input rst_n, input acc_en, // access indication input [2:0] acc_idx, // index to accessed entry output logic [2:0] lru_idx // index to LRU entry ); // tree node logic h0, h1, h2, h3, h4, h5, h6; always_ff @(posedge clk or negedge rst_n) if (~rst_n) h0 <= '0; else if (acc_en) h0 <= ~h0; // always flip h0 if there's an access always_ff @(posedge clk or negedge rst_n) if (~rst_n) h1 <= '0; else if (acc_en & ~acc_idx[2]) h1 <= ~h1; // flip h1 if acc_idx == 0/1/2/3 always_ff @(posedge clk or negedge rst_n) if (~rst_n) h2 <= '0; else if (acc_en & acc_idx[2]) h2 <= ~h2; // flip h2 if acc_idx == 4/5/6/7 always_ff @(posedge clk or negedge rst_n) if (~rst_n) h3 <= '0; else if (acc_en & (acc_idx[2:1] == 2'b00)) h3 <= ~h3; // flip h3 if acc_idx == 0/1 always_ff @(posedge clk or negedge rst_n) if (~rst_n) h4 <= '0; else if (acc_en & (acc_idx[2:1] == 2'b01)) h4 <= ~h4; // flip h4 if acc_idx == 2/3 always_ff @(posedge clk or negedge rst_n) if (~rst_n) h5 <= '0; else if (acc_en & (acc_idx[2:1] == 2'b10)) h5 <= ~h5; // flip h5 if acc_idx == 4/5 always_ff @(posedge clk or negedge rst_n) if (~rst_n) h6 <= '0; else if (acc_en & (acc_idx[2:1] == 2'b11)) h6 <= ~h6; // flip h6 if acc_idx == 6/7 // lru_idx assignment assign lru_idx[2] = h0; assign lru_idx[1] = ~h0 & h1 | h0 & h2; assign lru_idx[0] = (~h0 & ~h1 & ~h3) | (~h0 & h1 & ~h4) | (h0 & ~h2 & ~h5) | (~h0 & h2 & ~h6); endmodule: lru_pseudo |

We covered 2 true LRU implementations: square matrix implementation and counter based implementation in previous post. We will discuss 2 more true LRU implementations in this post.

In the following discussion, we assume the number of ways in a cache set is N.

Linked list is probably the most straightforward LRU implementation. Nodes are linked from head to tail. Head points to the most recently used item, while tail points to the least recently used item.

Upon a cache access, the corresponding node “jumps” to the head, and the previous head becomes the next node of new head.

Least Recently Used, or LRU, is the optimal cache replacement policy, and it is used in various applications where arbitration is required. In this post, we will discuss different true LRU implementations, based on the paper “Highly Efficient LRU Implementations for High Associativity Cache Memory”.

In the following discussion, we assume the number of ways in a cache set is N.

If the sequence is not predefined, then we can no longer use traditional FSM based sequence detector. Assuming we know the sequence to detect is 5-bit, then we can use the following circuit to detect the sequence.

The circuit consists of a 5-stage shift register, and a 5-bit configuration register. The sequence to detect is programmed in the configuration register, and the input sequence is compared with the configuration register every cycle.

Assuming incoming bit stream is one bit per cycle, design a circuit that detects whether the integer number formed by the bit stream is a multiple of 5.

The idea is to have an FSM consisting of 5 states, S0, S1, S2, S3, S4. Each state represents divided by 5 remainder in previous cycle. If FSM stays at S0, then it means the number could be divided by 5 in previous cycle; otherwise, the number was not a multiple of 5.

Continue reading → Determine Whether An Infinite Sequence Is A Multiple of 5