## How to implement multiplication by 5 for BCD code?

Interviewees who are not familiar with BCD code, we recommend to read this link first.

Assume the BCD code value is within the range between 0 and 9. Thus we can use the table below to show the inputs and outputs of the multiplier.

## What is the benefit of using half-cycle-path?

Until now, we focus on the timing of one-cycle-path or full-cycle-path. Sometimes, there may exist half-cycle-path in design. One example will be the launch flop is negedge triggered, while the capture flop is still posedge triggered.

We discussed the clock skew and how it affects STA in previous post. Equivalently, half-cycle-path can be modeled as one-cycle-path with clock skew δ = +T/2. It is obvious that hold time closure is easier while setup time closure is harder for half-cycle-path.

You may wonder what is the benefit of using half-cycle-path in the design.

## Design a simple ALU and draw its logical block diagram

Given the ALU pseudo code below, write the Verilog code and draw its logical block diagram using only 1 full adder, bitwise OR/AND, and as fewer MUXes as possible.

Verilog implementation should be straightforward:

## Can we arbitrarily increase the CPU pipeline depth?

CPU pipelining is a common technique to increase throughput and instruction level parallelism. Can we arbitrarily increase the CPU pipeline depth? Short answer is NO.

This is an open question. We recommend interviewees to answer this question from the following aspects:

## What are the differences between sync and async reset?

Synchronous reset has the following characteristics:

1. It will not work if clock path has failures
2. It tends to immune to glitches, since sync reset requires clock toggling
3. It makes STA simpler and the whole design synchronous, since sync reset has no difference from regular data path
4. Sync reset implementation will add a MUX at the input of each flop

## How to design a memory controller with in-order read responses?

Interviewers often ask how to design a memory controller in technical interviews. We show one example below.

The memory controller takes incoming requests along with address and request ID as inputs. It is expected to provide read responses along with response ID as outputs. Internally, it can access memory to fetch the read data.

## Why is there no possible performance improvement with cache upsizing?

Usually, with cache upsizing, we expect to see system performance improvement. However, this is not always the case. There could be several reasons:

1. The “compulsory”, instead of “capacity”, prevents the performance improvement from cache upsizing. This means the temporal locality and spatial locality offered by cache are not utilized. For example, the program keeps to access new data and there is no data reuse, which can happen in streaming applications; if context switch happens often, then cache flush may happen often and more “compulsory” will occur

2. In cache-coherent system, there may be 2 caches competing for one copy of data, i.e., “coherence” miss. This can happen when 2 CPUs want to gain the lock or semaphore simultaneously. Increasing cache size will not help performance in this case
3. Assuming the cache upsizing is achieved by cache line upsizing, then the loading time of a cache line will increase. This in turn increases the cache miss penalty and average memory access time
4. Assuming the cache upsizing is achieved by increasing associativity, then the hit latency as well as average memory access time may increase. This is because physical implementation of high associativity cache can be hard