We have discussed CPU pipelining and out-of-order scheduling in previous posts, and these are for increasing instruction-level parallelism. The next level of parallelism will be thread-level parallelism, or TLP. One solution to TLP is multiprocessors, i.e., a computer system consisting of multiple processors, typically controlled by one operating system and sharing the same memory address space.
Generally speaking, there are two multiprocessor architectures
UMA architecture, as its name suggests, is a symmetric centralized shared-memory architecture. All processors have a uniform memory access latency. Usually, this architecture is used in systems of no more than eight processors. A typical UMA architecture is shown in the diagram below.
UMA architecture is intuitive, however, the share last level cache and the bus connected to all processors are the system bottleneck.
NUMA architecture implies a distributed shared memory architecture, and the memory access latency depends on the location of the data in memory. A typical NUMA architecture is shown in the diagram below.
The memory distribution enables easier system scaling, however, the key disadvantage is that communicating data among processors becomes more complex.
Understanding NUMA / UMA architectures establishes the baseline for cache coherency. It is highly recommended to read J. Hennessy and D. Patterson’s book, Computer Architecture, Chapter 5 for more details. In next post, we will cover what is cache coherency and how to enforce cache coherency.