numactl command in Linux

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Assuming we want to control how threads are assigned to processor cores or choose where we want to allocate data, then numactl command is right for such tasks. In this article we have discussed how to perform such actions using numactl commands.

Table of contents.

Introduction.
Syntax.
Commands.
Summary.
References.

Introduction.

Modern processors take a Non uniform Memory Access (NUMA) approach to hardware design.

Sometimes we want to control how threads are assigned to processor cores so as to avoid use of hyper-threading and instead use hardware threads or making sure a task does not migrate frequently.

In Linux numactl is used for such tasks, it gives the ability to choose core on which we want to execute tasks on and also the ability to choose where we want to allocate data thanks to two policies, NUMA scheduling policy and NUMA memory placement policy respectively.

Syntax.

The syntax is as follows,

numactl [ --interleave nodes ] [ --preferred node ] [ --membind nodes ] [ --cpunodebind nodes ] [ --physcpubind cpus ] [ --localalloc ] command {arguments ...}

The various policy settings are,

--interleave=nodes, -i nodes which sets a memory interleave policy whereby memory will be allocated using a round robin mechanism on the nodes and when it cannot be allocated on the current interleave, target falls back to other nodes.
We can specify 'all' which will mean all nodes in the current set.
To specify nodes we write n,n,n or n-n or n-n,n-n e.g 0-4 which specifies nodes from 0 to 4.

To specify relative nodes we can write, +n,n,n or +n-n or +n,n-n, where + indicates that node numbers are relative to the process' set of allowed nodes in the current cpuset.

Inversely we can write !n-n to indicate all nodes except n-n nodes.

--preferred=node specifies that we prefer it be allocated on the specified node is possible, otherwise fall back to other nodes. Relative notation can also be used here.

--membind=nodes, -m nodes which means that we only allocate memory from nodes. When there is not enough memory available on the nodes, allocation will fail.

--cpunodebind=nodes, -N nodes meaning only execute a command on the CPUs of the specified nodes. Nodes may consist of several CPUs.

--physcpubind=cpus, -C cpus meaning only execute processes on the specified cpus. This will take the cpu numbers are described in the /proc/cpuinfo file or relative cpus relative to the current cpuset.

To view active cpus listing we write,

cat /proc/cpuinfo

Specifying cpus is similar to the previously described specification on --interleave=nodes, -i nodes

--localalloc -l is used when we want to allocate on the current node.

numactl [ --huge ] [ --offset offset ] [ --shmmode shmmode ] [ --length length ] [ --strict ]
[ --shmid id ] --shm shmkeyfile | --file tmpfsfile
[ --touch ] [ --dump ] [ --dump-nodes ]

--huge, for using huge pages when creating a SYSV shared memory segment.

--offset offset, To specify an offset into the shared memory segment, e.g m for MB, g for GB, k for KB, the default is 0 or when not specified it is in bytes.

--shmmode shmmode, valid before --shmid or --shm, we set it to shmmode numeric mode when creating a shared memory segment.

--length length, to specify the length of the new segment, i.e m for MB, k for KB, g for GB, default value is in bytes.

--strict, to produce an error when a page in an area with a policy in the shared memory segment was faulted with a conflicting policy. By default this is ignored silently.

--shmid id, used to create or use a shared memory segment with a specified numeric id.

--shm shmkeyfile, to create or use a shared memory segment whose id is generated using ftok from shmkeyfile.

--file tmpfsfile, to set a policy for a file in tmpfs or hugetlbfs.

--touch, to touch pages so as to enforce a policy early. A policy is applied when an application maps and accesses a page. By default they are not touched.

--dump, for the dump policy in the specified range.

--dump-nodes, to dump all nodes in the specified range.

To view the NUMA architecture of a system write,

numactl --hardware

To view the NUMA policy of the current process we write,

numactl --show

To view the NUMA memory hit statistics we write,

cat /sys/devices/system/node/node*/numastat

Commands.

To run a program testProg on cpu 0 using memory on nodes 0 and 1, we write,

numactl --cpubind=0 --membind=0,1 testProg

To run an application testApp on cpus 0-4 and 8-12 of the current cpu set, we write,

numactl --physcpubind=+0-4,8-12 testApp arguments

To run a process bigProcess and interleave its memory on all CPUs we write,

numactl --interleave=all bigProcess arguments

To run a process on a preferred node 1 and display the resulting state, we write,

numactl --preferred=1 numactl --show

To run a process on nodes 4 with memory allocated on nodes 4 and 5 we write,

numactl --cpubind=4 --membind=4,5 process

Summary.

numactl is a Linux function that runs processes with a specified NUMA scheduling or memory placement policy.

It binds processes to processors on Linux NUMA supercomputers.
The goal of using numactl is to confine a process to a numa pool or CPU node rather than specific CPU cores.

With numactl we can bind a CPUs memory locality so as to prevent jumps across NUMA pools/memory nodes.

References.

man numactl.