Heisenbug is a term commonly used in Computer Programming to refer to a bug (or program error) which disappears when an attempt to find the source is made.
It is inspired by Heisenberg Uncertainty Principle in Quantum Mechanics which states that accuracy of position and momemtum of a particle is inversely related. It means
Common causes of Heisenbugs:
- Interference from other processes running on the system (like changing a memory accessed by the original process)
- Hardware issue on the device (like detacting a component due to lose connections)
- Heisenbug on a dependant software component like compilers
We have covered some techniques to solve a heisenbug as well at the end. The main problem with Heisenbug is that:
- One is not sure regarding the source of the problem
- One is not sure if a particular change has fixed the bug
Appearance of the term
A paper titled "Why do Computers stop and what can be done about it?" by Jim Gray, Turing Award winner which was published in June 1985.
A sub-title in his paper was as follows: "Software faults are soft - the Bohrbug/Heisenbug hypothesis". In this section, we explains the difference between bohrbugs and heisenbugs and some of the experiments he has done to isolate heisenbugs.
Nothing quantitative come out of his experiments though. He explained how heisenbugs can be used by System developers to develop fault tolerant softwares.
Read this paper by Jim Gray: here on CiteSeer
Note that the term "Heisenbug" existed before the use by Jim Gary and appeared on some previous research papers on ACM as well. It is believed that the term was coined within IBM during the 1950s.
Bohrbug vs Heisenbug
Bohrbug and Heisenbug are related terms but are used in different sense. The difference has been covered by Jim Gray as well. The basic difference is as follows:
- Heisenbugs are bugs that disappear when an attempt is made to figure out the source (Inspired by Heisenberg Uncertainty Principle in Quantum Mechanics)
- Bohrbugs are bugs that are easily to isolate and figure out the source using standard techniques (inspired by Bohr's atom model in Physics/ Chemistry)
How to solve a Heisenbug?
Solving a Heisenbug requires some serious skills and concentration on the part of a Software Developer. There is no standard set of techniques but a few common techniques that have proven to be useful while solving numerous heisenbugs are:
- Monitor system processes
The focus is to ensure that no other process is interfering with our process. Common techniques are to use "htop" on Linux.
- Run on single thread and single core
It has been seen several times that programs are correct but not built to run on multiple threads. Though running on multiple threads give significant performance boost but ensuring the it has been implemented correctly is a skill.
To do this using OpenMP is to set:
- Run on CPU instead of GPU
Similar to our previous point, GPU gives performance boost but softwares, by default, may not be able to support GPU or use all available resources correctly. Turn off GPU on your system and try running on CPU.
- Run on one NUMA node
This is to ensure that the heisenbug is not coming from the System architecture issue as it is, often, seen that inter-connections between different NUMA nodes can become unstable.
To do this, use the following command:
numactl cpunidebind=0 membind=0 <command>
- Change software dependencies
The focus is to ensure that heisenbug is not coming from any external software component. Try using vanilla versions or previous stable versions. For example, instead of using Intel optimized TensorFlow with DNNLv1.1, one shall try with Vanilla TensorFlow (built from source or installed using pip).
- Remove extra features
This is a common technique where one removes software features one by one and try to figure out the feature that is the source of heisenbug. This can be challenging as heisenbug may be coming from a specific set of features together.
The above techniques will solve most heisenbugs but there might be a case, when the above the techniques will become complex. In that case, one should come up with custom techniques to solve heisenbugs.
This meme summarizes the idea of Heisenbug: