In previous post, we discussed the UMA / NUMA architecture. But caching shared data in different processors becomes a new problem. This is because different processors hold their view of memory through their individual caches. Cache coherency plays an important role in avoiding different processors seeing different values for the same data.
What is Cache Coherency?
Cache coherency means any read of a data returns the most recently written value of that data. For example, in a 2-processor system, processor A writes data D to memory location X; sometime later, processor B reads memory location X, and the system should return data D. If system does not return data D to processor B, then the system is not cache-coherent.
In a nutshell, cache coherency defines the behavior of reads and writes to the same memory location.
How to Enforce Cache Coherency?
There are 2 basic schemes to enforce cache coherency. The first one is directory based. The directory will keep the sharing status of a memory block. In UMA, this can be easily implemented since the directory can be associated with the centralized memory or the system bus. In NUMA, directory based scheme does not make sense, since the directory will introduce a system performance bottleneck, which we are trying to avoid in the first place.
The other scheme is snoop based. Unlike directory based, the sharing status of a block is distributed in different caches. In UMA, cache blocks are usually accessible through some system bus, and each cache controller will “snoop” the system bus to decide whether to supply the data for requests from other processors, and to switch cache blocks status. In NUMA, snooping is usually combined with directory. Cache controller needs to send request to corresponding main memory, and the directory associated with that main memory decides whether to supply data directly, or to fetch data from other caches.
Snooping Coherence Schemes
There are 2 basic approaches in the snoop based schemes. The first one is write invalidate since it invalidates other copies of a write. Usually, only one cache can keep a written copy of a block, copies in other caches must be invalidated. This is by far the most common schemes.
The alternative approach is write update, i.e., whenever there is a write to a cache block, the updated value will be broadcasted to all caches. This approach is not preferred because broadcasting requires significantly more bandwidth.
We have briefly discussed the cache coherency and the basic schemes of enforcing cache coherency. It is still highly recommended to read J. Hennessy and D. Patterson’s book, Computer Architecture, Chapter 5 for more details. In next post, we will cover the basics of cache coherent protocol.