Fix OOM error from Bazel Build

In this article, we have explored ways to fix the Out of Memory error (OOM) from Bazel Build or during TensorFlow build process and limit the memory requirements of Bazel.

Table of contents:

  1. OOM Issue from Bazel Build/ TensorFlow
  2. Fixes: Reduce memory requirements of Bazel

OOM Issue from Bazel Build/ TensorFlow

If you system has less memory compared to the number of cores in your systems, each core has less memory available and due to this, Bazel Build process often exit with Out Of Memory (OOM) error. This can also result in the server going down.

One situation is TensorFlow build with Bazel which can give OOM errors. In fact, most OOM errors from TensorFlow build process are due to Bazel. If your server goes down while build TensorFlow, this is the problem.

There are two approaches to fix the issue:

  • Increase the memory of the system (Hardware solution)
  • Reduce the memory requirements of Bazel

Fixes: Reduce memory requirements of Bazel

One issue with Bazel is that is assumes the entire memory is available for a process running in a particular CPU core. Bazel only considers the global memory by default. The fix is to make Bazel consider the local memory that is the actual memory available to a process in a particular CPU core.

To enable Bazel consider local memory, enable the following option in the Bazel Build command:

--experimental_local_memory_estimate=True

By default, the above option is set to False.

The second approach is to limit the number of jobs Bazel Build is running on. By default, Bazel build will create as many jobs (parallel processes) as possible and each run on different CPU cores. Having multiple jobs speed up the process but also increases the load on the system.

To fix this, one can limit the number of parallel processes/ jobs using the following option in the Bazel Build command:

-jobs 16

Experiment with the number of jobs that suits your system.

Additionally, set the environment variable OMP_NUM_THREADS to a low value to limit the concurrent jobs in Bazel Build. It can be done using the following command in the terminal:

export OMP_NUM_THREADS=8

In Bazel, we can set the maximum Heap size available to limit the memory use. To do so, use the following option in the Bazel Build command:

--host_jvm_args=-Xmx2g

The above option sets the size to 2GB. You shall set it to a size that suits your system best. Lower values can be like -Xmx512k for 512KB.

Bazel Build avoids rebuilding unchanged files which have been previous build. This requires the use of cache and is an overhead. To avoid this and reduce the memory consumption by Bazel during execution, use the following option in the Bazel Build commands:

--discard_analysis_cache True

To instruct Bazel to clean up all memory it used for the Build at the end so that the next build has maximum memory available, use the following command:

--nokeep_state_after_build True

This will not save any data at the end so memory occupied will be less but the next re-build will slow down considerably as it will start from stratch.

Bazel maintains a dependency graph which speeds up the build process. We can save additional memory by not saving the dependency graph using the following option in the Bazel Build command:

--notrack_incremental_state True

Hence, the following options are helpful in limiting the memory required by Bazel Build and potentially, avoid OOM errors:

--experimental_local_memory_estimate=True
-jobs 16
--host_jvm_args=-Xmx2g
--discard_analysis_cache True
--nokeep_state_after_build True
--notrack_incremental_state True

export OMP_NUM_THREADS=8

With the options and commands in this article at OpenGenus, you can potentially avoid Out Of Memory OOM errors from Bazel Build.