hwloc (hardware locality) in Linux

Do not miss this exclusive book on Binary Tree Problems. Get it now for free.

The hwloc (Hardware Locality) is a program suite with multiple command line tools and a C API which are used for discovering hardware resources in parallel computing architectures its discovers information such as their locality, attributes, interconnection etc.

Table of contents.

  1. Introduction.
  2. hwloc-bind.
  3. hwloc-calc.
  4. hwloc-ps.
  5. hwloc-distrib
  6. Summary.
  7. References.

Prerequisites.

  1. lstopo

Introduction.

Computing platforms today come with multiple cores, shared caches and NUMA architectures and knowledge of such platforms is essential so as to utilize these features while improving efficiency and performance of a system.

For example we can place two cooperating tasks in single cluster on cores with a shared cache.

Another example is to split two independent memory intensive processes across several sockets which will result in improved memory throughput.
Such is only possible if we have an idea of the system's hardware topology.
The hwloc(Hardware Locality) program is used to visualize the system's hardware so as to apply the such optimizations.

hwloc provides portable abstraction of the hierarchical structure of a modern architecture, which involves NUMA memory nodes, sockets, cores, shared caches and multi-threading simultaneously.
It has information involving cache, memory, I/O devices locations etc.

hwloc objects include,

  • Machine, which is a set of processors and memory.
  • Node, (NUMA node), which is a set of processors around memory which can be directly accessed by the processors.
  • Socket, is the physical package, we can also describe it as a grouping of one or more processors.
  • Core, is a single processing unit that might contain multipl logical processors e.g hardware threads.
  • Pu, is the processor unit, it is the smallest physical execution unit recognized by hwloc.

To obtain information regarding objects, features or topology of the system, write,

hwloc-info --topology

To report features supported by hwloc on the topology, write,

hwloc-info --support

To get information about a core whose physical index is 1, we write,

hwloc-info -p core:1

For other options use the --help option.

hwloc-bind.

This command binds a command to specified processor and/or memory.

It's syntax is as follows,

hwloc-bind [options] <location1> [<location2> [...] ] [--] <command> ... 

Commands
To execute the wget command on the first logical processor in the second socket we write,

hwloc-bind socket:1.pu0 --wget -b www.example.com

To bind the wget command to the first core of the second socket and the second core of the first socket we write,

hwloc-bind socket:1.core:0 socket:0.core:1 wget -b www.example.com 

To execute the same command with three sockets on the second and their nodes, we write,

hwloc-bind node:1-2.socket:0:3 wget -b www.example.com

we can also write,

hwloc-bind node:1-2.socket:0-2 wget -b www.example.com

To execute on odd numbered cores within even numbered sockets, we write,

hwloc-bind socket:even.core:odd wget -b www.example.com

We can also bind memory as follows,

hwloc-bind --cpubind node:1 --membind node:0 wget -b www.example.com

The above command binds memory on the second memory node and executes on the first node.

To view current bindings, we use the --get option as follows,

hwloc-bind node:1.socket:2 hwloc-bind --get 

For other options write, hwloc-bind --help.

hwloc-calc.

It is used to generate and manipulate CPU mask strings or objects.
Its inputs may be objects, CPU lists or CPU mask strings.
If objects or CPU masks are passed as arguments, they are combined to print out a single output otherwise if no arguments are passed, the program reads from standard input.

Commands
To display the physical CPU mask which corresponds to the second socket we write,

hwloc-calc socket:1 0x000000f0

To combine two physical CPU masks, we write,

hwloc-calc 0x0000ffff 0xff000000 0xff00ffff

To display NUMA nodes by physical indexes which intersect a given physical CPU mask, we write,

hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0 0,2

To print a processor's physical index which is given by its logical index, we write,

hwloc-calc PU:2 --physical-output --intersect PU

To combine physical and logical indexes as input, we write,

hwloc-calc PU:2 --physical-input PU:3 0x0000000c

For other options we use the hwloc-calc --help command.

hwloc-ps

This is used to list all currently running processes that are bound. This is its default behavior although we can use -t option so as to display unbound processes which contain atleast a single thread.
Its output involves the process ID, command-line and binding.

Commands
To show all processes we use the -a option,

hwloc-ps -a

Assuming a process is not bound although three of its four threads are bound, It will appear in the thread-aware output as follows,

hwloc-ps -t 8484 

The output,

Machine:0

program 8484

Machine:0

8167

PU:0

8169

PU:2

8180

PU:1 

hwloc-distrib.

It generates CPU masks which correspond to a distribution of a given number of elements over the topology of the machine.
Distribution is recursive from the top to the bottom of the hierarchy.
Elements are split at each level except those ignored by --ignore option.

An example
If 4 processes have to be distributed across a machine, we can obtain their CPU masks by writing,

hwloc-distrib 4 0x0000000f 0x00000f00 0x000000f0 0x0000f000

To distribute among the second socket, we restrict a topology using the --restrict option as follows,

hwloc-distrib --restrict $(hwloc-calc socket:1) 4 0x00000010 0x00000020 0x00000040 0x00000080

We can also convert each output line with hwloc-calc as follows,

hwloc-distrib 4 --single | hwloc-calc --taskset 0x1 0x100 0x10 0x1000 

To convert the output of a list of processors that might be passed to dplace -c inside a mpirun command line we write,

hwloc-distrib 4 --single | xargs hwloc-calc --pulist 0,8,4,16 

For other options, we use the --help option with hwloc-distrib.

hwloc-distrib --help

Other related hwloc tools include,
hwloc-distances which displays distance matrices attached to a topology, That is the value in the i-th row and the j-th column represents the distance from an object #i to object #j.

hwloc-diff which computes the differences between two XML topologies then stores the obtained result in a file if a file has been defined otherwise, the output is sent to standard output.
Its syntax is as follows,

hwloc-diff [options] <old.xml> <new.xml> [<output.diff.xml>]

For example to compute difference between topo1.xml and topo2.exl we write,

hwloc-diff topo1.xml topo2.xml 

To specify a file output.xml

hwloc-diff topo1.xml topo2.xml output.xml

Other options can be obtained by using --help option.

hwloc-compress-dir which takes a directory which contains XML exports as input then tries to compress it by computing differences between them.
An example

hwloc-compress-dir inputDir outputDir

The command compresses inputDir and sends the result into outputDir.

hwloc-assembler combines input XML topologies then exports the resulting global to a new XML file.
An example

hwloc-assembler output.xml --name host1 host1.xml --name host2 host2.xml

The above command assembles to nodes' topologies.
If the assembling was successful, the command returns 0.

hwloc-assembler-remote retrieves remote nodes' topologies so as to assemble them with the hwloc-assembler.
An example

hwloc-assembler-remote output.xml host1 host2 host3

The above command assembles three nodes' topologies.

hwloc-annotate will load a topology from an XML file, add annotations then export the resulting topology to another XML file.
Annotations can be string info attributes.
An example

hwloc-annotate input.xml output.xml Core:all info infoname infovalue 

The above command adds info attribute to all core objects.

hwloc-dump-hwdata. hwloc utilizes locality and topology information from SMBIOS or ACPI tables which are accessible from raw h/w files only accessible by root user. This tool is used to dump such useful content into human-readable and world-accessible files.

Finally, hwloc-gather-topology is used to save relevant topology files in an archive and lstopo output for later usage.
An example

hwloc-gather-topology /tmp/myhost

The command will store topology information in /tmp/myhost.tar.bz2 archive and lstopo output in /tmp/myhost.output file.

Summary.

hwloc suite is useful for leveraging code and/or data locality on modern computing platforms so as to achieve efficiency and optimizations.
It also assists high-performance computing applications.
It provides a portable mechanism to examine and utilize server hardware technology.

References.

  1. Hardware locality documentation at University of Tennessee.
  2. Execute command --help or man command for a commands manual page in Linux distributions.

Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.