hwloc (hardware locality) in Linux
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
The hwloc (Hardware Locality) is a program suite with multiple command line tools and a C API which are used for discovering hardware resources in parallel computing architectures its discovers information such as their locality, attributes, interconnection etc.
Table of contents.
- Introduction.
- hwloc-bind.
- hwloc-calc.
- hwloc-ps.
- hwloc-distrib
- Summary.
- References.
Prerequisites.
Introduction.
Computing platforms today come with multiple cores, shared caches and NUMA architectures and knowledge of such platforms is essential so as to utilize these features while improving efficiency and performance of a system.
For example we can place two cooperating tasks in single cluster on cores with a shared cache.
Another example is to split two independent memory intensive processes across several sockets which will result in improved memory throughput.
Such is only possible if we have an idea of the system's hardware topology.
The hwloc(Hardware Locality) program is used to visualize the system's hardware so as to apply the such optimizations.
hwloc provides portable abstraction of the hierarchical structure of a modern architecture, which involves NUMA memory nodes, sockets, cores, shared caches and multi-threading simultaneously.
It has information involving cache, memory, I/O devices locations etc.
hwloc objects include,
- Machine, which is a set of processors and memory.
- Node, (NUMA node), which is a set of processors around memory which can be directly accessed by the processors.
- Socket, is the physical package, we can also describe it as a grouping of one or more processors.
- Core, is a single processing unit that might contain multipl logical processors e.g hardware threads.
- Pu, is the processor unit, it is the smallest physical execution unit recognized by hwloc.
To obtain information regarding objects, features or topology of the system, write,
hwloc-info --topology
To report features supported by hwloc on the topology, write,
hwloc-info --support
To get information about a core whose physical index is 1, we write,
hwloc-info -p core:1
For other options use the --help option.
hwloc-bind.
This command binds a command to specified processor and/or memory.
It's syntax is as follows,
hwloc-bind [options] <location1> [<location2> [...] ] [--] <command> ...
Commands
To execute the wget command on the first logical processor in the second socket we write,
hwloc-bind socket:1.pu0 --wget -b www.example.com
To bind the wget command to the first core of the second socket and the second core of the first socket we write,
hwloc-bind socket:1.core:0 socket:0.core:1 wget -b www.example.com
To execute the same command with three sockets on the second and their nodes, we write,
hwloc-bind node:1-2.socket:0:3 wget -b www.example.com
we can also write,
hwloc-bind node:1-2.socket:0-2 wget -b www.example.com
To execute on odd numbered cores within even numbered sockets, we write,
hwloc-bind socket:even.core:odd wget -b www.example.com
We can also bind memory as follows,
hwloc-bind --cpubind node:1 --membind node:0 wget -b www.example.com
The above command binds memory on the second memory node and executes on the first node.
To view current bindings, we use the --get option as follows,
hwloc-bind node:1.socket:2 hwloc-bind --get
For other options write, hwloc-bind --help.
hwloc-calc.
It is used to generate and manipulate CPU mask strings or objects.
Its inputs may be objects, CPU lists or CPU mask strings.
If objects or CPU masks are passed as arguments, they are combined to print out a single output otherwise if no arguments are passed, the program reads from standard input.
Commands
To display the physical CPU mask which corresponds to the second socket we write,
hwloc-calc socket:1 0x000000f0
To combine two physical CPU masks, we write,
hwloc-calc 0x0000ffff 0xff000000 0xff00ffff
To display NUMA nodes by physical indexes which intersect a given physical CPU mask, we write,
hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0 0,2
To print a processor's physical index which is given by its logical index, we write,
hwloc-calc PU:2 --physical-output --intersect PU
To combine physical and logical indexes as input, we write,
hwloc-calc PU:2 --physical-input PU:3 0x0000000c
For other options we use the hwloc-calc --help command.
hwloc-ps
This is used to list all currently running processes that are bound. This is its default behavior although we can use -t option so as to display unbound processes which contain atleast a single thread.
Its output involves the process ID, command-line and binding.
Commands
To show all processes we use the -a option,
hwloc-ps -a
Assuming a process is not bound although three of its four threads are bound, It will appear in the thread-aware output as follows,
hwloc-ps -t 8484
The output,
Machine:0
program 8484
Machine:0
8167
PU:0
8169
PU:2
8180
PU:1
hwloc-distrib.
It generates CPU masks which correspond to a distribution of a given number of elements over the topology of the machine.
Distribution is recursive from the top to the bottom of the hierarchy.
Elements are split at each level except those ignored by --ignore option.
An example
If 4 processes have to be distributed across a machine, we can obtain their CPU masks by writing,
hwloc-distrib 4 0x0000000f 0x00000f00 0x000000f0 0x0000f000
To distribute among the second socket, we restrict a topology using the --restrict option as follows,
hwloc-distrib --restrict $(hwloc-calc socket:1) 4 0x00000010 0x00000020 0x00000040 0x00000080
We can also convert each output line with hwloc-calc as follows,
hwloc-distrib 4 --single | hwloc-calc --taskset 0x1 0x100 0x10 0x1000
To convert the output of a list of processors that might be passed to dplace -c inside a mpirun command line we write,
hwloc-distrib 4 --single | xargs hwloc-calc --pulist 0,8,4,16
For other options, we use the --help option with hwloc-distrib.
hwloc-distrib --help
Other related hwloc tools include,
hwloc-distances which displays distance matrices attached to a topology, That is the value in the i-th row and the j-th column represents the distance from an object #i to object #j.
hwloc-diff which computes the differences between two XML topologies then stores the obtained result in a file if a file has been defined otherwise, the output is sent to standard output.
Its syntax is as follows,
hwloc-diff [options] <old.xml> <new.xml> [<output.diff.xml>]
For example to compute difference between topo1.xml and topo2.exl we write,
hwloc-diff topo1.xml topo2.xml
To specify a file output.xml
hwloc-diff topo1.xml topo2.xml output.xml
Other options can be obtained by using --help option.
hwloc-compress-dir which takes a directory which contains XML exports as input then tries to compress it by computing differences between them.
An example
hwloc-compress-dir inputDir outputDir
The command compresses inputDir and sends the result into outputDir.
hwloc-assembler combines input XML topologies then exports the resulting global to a new XML file.
An example
hwloc-assembler output.xml --name host1 host1.xml --name host2 host2.xml
The above command assembles to nodes' topologies.
If the assembling was successful, the command returns 0.
hwloc-assembler-remote retrieves remote nodes' topologies so as to assemble them with the hwloc-assembler.
An example
hwloc-assembler-remote output.xml host1 host2 host3
The above command assembles three nodes' topologies.
hwloc-annotate will load a topology from an XML file, add annotations then export the resulting topology to another XML file.
Annotations can be string info attributes.
An example
hwloc-annotate input.xml output.xml Core:all info infoname infovalue
The above command adds info attribute to all core objects.
hwloc-dump-hwdata. hwloc utilizes locality and topology information from SMBIOS or ACPI tables which are accessible from raw h/w files only accessible by root user. This tool is used to dump such useful content into human-readable and world-accessible files.
Finally, hwloc-gather-topology is used to save relevant topology files in an archive and lstopo output for later usage.
An example
hwloc-gather-topology /tmp/myhost
The command will store topology information in /tmp/myhost.tar.bz2 archive and lstopo output in /tmp/myhost.output file.
Summary.
hwloc suite is useful for leveraging code and/or data locality on modern computing platforms so as to achieve efficiency and optimizations.
It also assists high-performance computing applications.
It provides a portable mechanism to examine and utilize server hardware technology.
References.
- Hardware locality documentation at University of Tennessee.
- Execute command --help or man command for a commands manual page in Linux distributions.
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.