We covered 2 true LRU implementations: square matrix implementation and counter based implementation in previous post. We will discuss 2 more true LRU implementations in this post.

In the following discussion, we assume the number of ways in a cache set is N.

Linked List Implementation

Linked list is probably the most straightforward LRU implementation. Nodes are linked from head to tail. Head points to the most recently used item, while tail points to the least recently used item.

Upon a cache access, the corresponding node “jumps” to the head, and the previous head becomes the next node of new head.

Since a cache access can happen to any cache way or any node in the linked list, a double linked list is required. Each node in the linked list keeps 2 pointers: previous index and next index.

The total storage in linked list implementation is 2 x N x log(N).

Systolic Array Implementation

J.P. Grossman proposed a systolic array LRU implementation in the paper “A Systolic Array for Implementing LRU Replacement”.

This implementation is a derivative to linked list. The array maintains a list of cache way indices sorted from LRU to MRU. LRU info is always available since it is the head of the array.

However, unlike linked list, systolic array does not update the list immediately upon each cache access. Instead, the MRU info is slowly propagated from list head to tail. Thus it may take multiple cycles to update MRU.

The storage for every 2 stages is 3log(n) + 1, thus total storage = n / 2 x (3 x log(n) + 1) ~= 1.5nlog(n), which is an improvement over linked list.

The following code shows the systolic array implementation. We highly recommend interviewees refer to the original paper for more details.

Conclusion

Although true LRU is the optimal cache way replacement policy, it is quite expensive in terms of area and power. In next post, we will discuss pseudo LRU implementation, which is relatively cheaper and easier to implement.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.