If the number of sync FIFO entries is too large, flop based FIFO may take too much area and may have issues closing timing, making it impractical in real design. The most straightforward way to solve this is to replace the memory array with dual-port SRAM.
Dual-port SRAM has one read port and one write port, and both ports operate under the same clock. Similar to flops, it takes one cycle from write enable assertion to SRAM array getting updated, but the difference is, it takes one cycle from read enable assertion to read data available. This implies that we need to “prefetch” the read data from SRAM to mimic the flop behavior.
Another tricky part is, the SRAM write to read latency to the same address is 2 cycles. For example, we want to write some data to SRAM address A, one cycle later address A gets updated; in the same cycle, we can read from address A, and the read data is available one cycle later. If read and write to the same address happens in the same cycle, there will be a direct path from SRAM data input to data output, messing up with the timing. To hide this latency, we need to have 2 extra flops for data buffering.
The diagram below shows one possible implementation. “din_q” and “ram_din” is always updated with “din”, and “dout” is primarily used as read data “prefetch” buffer. “din_q” and “dout” together are the data buffers to hide 2-cycle write to read latency when FIFO occupied entry count is less than or equal to 2. If FIFO occupied entry is larger than 2, SRAM output “ram_out” will get read data available 2 cycle earlier, and “dout” will be updated using SRAM output “ram_dout”.