Multicore processors were first introduced in the early 2000’s, and were pervasive in common computing platforms by the 2010’s. The industry started with dual core chips, and then quad core, and now we are up to 48 cores! When the hardware industry brought multicore chips to fruition, the software community had to invent new ways to utilize those additional cores. I’d like to use this post to discuss a few of the common multicore software utilization schemes or modes and the benefits and drawbacks of each.
Terms and Definitions
Right up front I want to define the terms I’ll be using throughout the post for clarification’s sake. A core is a single processing element, irrespective of the cache, RAM, peripherals, etc. which could be connected to it. A processor is a chip consisting of 1 to N cores. When talking about an application on a core of a multiprocessor, I mean any piece of code. That could be a simple Hello World, to a complex bare-metal piece of software, to a full-up OS.
Asymmetric Multiprocessing (AMP) Operation
One of the most straightforward ways to utilize more cores on the same processor is to treat each core as an individual single core processor all to its own. This mode is called asymmetric multiprocessing or AMP. This mode of operation allows you to run N applications on the N cores and treat them like N separate processors. One of the key advantages of this mode is that you can run different applications on each core, giving you greater flexibility as to your use of the multicore chip. There are some caveats to this mode of operation however. Most multicore processors do not include N copies of the supporting hardware units to make the multicore processor operate as N truly independent processors. Take for example, the NXP QorIQ P4080. It has 8 cores, but only 2 DRAM controllers. So even running in an AMP mode, you must share the DRAM controllers among the 8 cores. This couples the cores together so that they are not truly independent. This is a typical configuration for multicore systems wherein N cores share memory controller and data paths to peripherals and can present some planning challenges when using an AMP mode of operation.
Symmetric Multiprocessing (SMP) Operation
This is the mode of operation most people think of when envisioning a multicore processor. This mode of operation combines all the cores into a set of co-equal processing elements. The application is responsible for assigning work to each of the cores in the set and all hardware belongs to or is owned by the application. This is how most OSes work on multicore processors. Take for example Linux. Linux manages the workload spread across the N cores via the scheduler. The scheduler decides what process’s threads are ready to run, and divvies them up across the cores available. The Linux kernel also handles which threads of execution can access which hardware pieces, but this is a discussion for another day! The big benefit to SMP is that you have a “supervisor application” (usually the OS) managing all the processing resources and the software developer doesn’t have to put much thought into which core his software will run on. However, this abstraction can also introduce some latency into the system when a cache flush is required to move a thread of execution from one core to another.
Simultaneous Multithreading (SMT) Operation
This mode of operation is less well understood in general than its precursor SMP, although the concept is the same in operation. SMT applies to processors which support higher levels of utilization of the hardware units on the chip and requires a little bit of understanding of how a processor works. The short and sweet is that in any given processor, there are some common elements such as an ALU, instruction fetch unit, instruction decode unit, memory interface components, etc. When a processor is executing a given instruction, not all of those elements are active at the same time, even with a pipelined design. SMT allows software to view the single core in a multicore system as itself having more than one processing element available. In the parlance this is said as having a “N cores, M threads”. From the software’s point of view, after SMT is enabled, the processor simply has M cores. Most Intel processors support SMT and plenty of others do as well, like the NXP QorIQ T2080.
Bound Multiprocessing (BMP) Operation
This is a concept closely related to SMP in theory again, however with a little bit of a twist. In a BMP system, a single application owns the whole set of cores, but the application can decide to bind certain threads of execution to certain cores rather than floating them around the cores as needed. If a system supports SMP, it can easily be modified to support BMP as well (and most already have this baked in, see processor affinity settings in your favorite OS). The major benefit of BMP over SMP is that you don’t have threads of execution floating from one core to another, and so you have tighter-grained control over the execution. You do however lose some throughput capability as you may end up waiting on a bound thread to run on another core before your software can do its work, and so careful thought must be given to this mode of operation.
Mixed Multiprocessing Operation
This is not an officially recognized term, but it’s one that I think fits the bill. There are some systems that will allow you to use both AMP and SMP in the same processor. For example, when using a hypervised environment like Wind River’s Helix Virtualization Platform, you may sometimes want a configuration like the following:
|0||VxWorks 7 (AMP)|
|1||Bare-metal application (AMP)|
When a system supports both AMP and SMP in the same chip complex, I consider this a mixed multiprocessing environment. For more information on virtualization and how it can achieve these mixed modes operation, check out my series on Virtualization in Embedded Systems.
I hope this gave a quick intro to the different modes a multicore system can be in, but if you want more detail still, see this excellent article from NXP on these modes. It leaves out SMT, but covers AMP, SMP, and BMP pretty thoroughly.