KMP_AFFINITY is an environment variable that is used to control how hardware threads are distributed in relative to each other. This is used along with KMP_HW_SUBSET for finer control over the threads.
Table of contents:
- When to use KMP_AFFINITY?
- Basics of using KMP_AFFINITY
- KMP_AFFINITY in depth
When to use KMP_AFFINITY?
One should use KMP_AFFINITY when:
- Control how threads are distributed across available CPU topology
- KMP_HW_SUBSET has been set explicitly
- Run compute intensive applications efficiently
Basics of using KMP_AFFINITY
The basic usage of KMP_AFFINITY is as follows:
The main values for type are:
- compact: Threads are close to each other
- disabled: Does not pin threads and disables KMP_AFFINITY
- explicit: Use the proclist modifier to pin threads.
- none: Does not pin threads but OpenMP determines affinity.
- scatter: Equally distribute threads to cores
export KMP_AFFINITY=compact export KMP_AFFINITY=scatter
KMP_AFFINITY=compact is similar to OMP_PROC_BIND=close and KMP_AFFINITY=scatter is similar to OMP_PROC_BIND=spread.
KMP_AFFINITY in depth
The finer way of setting KMP_AFFINITY is:
The 3 parameters modifier, permute and offset are optional. Only the parater type is complusory.
The use of different parameters are as follows:
|Parameters for KMP_AFFINITY|
|modifier||Optional||To control granularity of threads and log messages||Any combination of: granularity, norespect, noverbose, nowarnings, noreset, proclist, respect, verbose, warnings, reset|
|type||Yes||To control distribution of threads||Anyone: balanced, compact, disabled, explicit, none, scatter, logical, physical|
|permute||Optional||Control which level of topology is most important||explicit, none, or disabled|
|offset||Optional||Select the position of thread assignment||explicit, none, or disabled|
The main points to set KMP_AFFINITY are as follows:
- Use type=compact, if you want the threads to be near the core.
- Use type=scatter, if you want the threads to be equally distributed across cores. This reduces cache and memory bandwidth so results in optimal performance.
- With type=explicit, we can tie threads to specific cores defined by proclist.
- Set granularity=core to pin threads to physical core, or set granularity=fine to pin to logical cores.
If the command is like this:
- Threads will be pinned to logical cores
- Threads will be near the cores as much as possible.
- permute is set to 1 so topology level 1 of the system will be given priority.
- offset is 0 so threads will be assigned from the first core.
With this article at OpenGenus, you must have the complete idea of KMP_AFFINITY and how to set it to get the most optimal performance.