We have discussed CPU pipelining and out-of-order scheduling in previous posts, and these are for increasing instruction-level parallelism. The next level of parallelism will be thread-level parallelism, or TLP. One solution to TLP is multiprocessors, i.e., a computer system consisting of multiple processors, typically controlled by one operating system and sharing the same memory address space.

Generally speaking, there are two multiprocessor architectures

Uniform memory access / UMA
Non uniform memory access / NUMA

UMA Architecture

UMA architecture, as its name suggests, is a symmetric centralized shared-memory architecture. All processors have a uniform memory access latency. Usually, this architecture is used in systems of no more than eight processors. A typical UMA architecture is shown in the diagram below.


UMA architecture is intuitive, however, the share last level cache and the bus connected to all processors are the system bottleneck.

NUMA Architecture

NUMA architecture implies a distributed shared memory architecture, and the memory access latency depends on the location of the data in memory. A typical NUMA architecture is shown in the diagram below.


The memory distribution enables easier system scaling, however, the key disadvantage is that communicating data among processors becomes more complex.


Understanding NUMA / UMA architectures establishes the baseline for cache coherency. It is highly recommended to read J. Hennessy and D. Patterson’s book, Computer Architecture, Chapter 5 for more details. In next post, we will cover what is cache coherency and how to enforce cache coherency.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.