The physical memory of the system is limited, and the demand for memory is changed. The more dynamic the program is, the more important the memory management is. Choosing the right memory management algorithm will bring about obvious performance improvement.
Let's start exploring jemalloc.
First of all let's know :
What is jemalloc?
jemalloc is a general purpose malloc implementation that emphasizes fragmentation avoidance and scalable concurrency support. jemalloc first came into use as the FreeBSD libc allocator in 2005, and since then it has found its way into numerous applications that rely on its predictable behavior.
#include <stdlib.h> #include <jemalloc/jemalloc.h>
What does jemalloc do?
The JVM (like any other process) will call the C runtime malloc function in order to allocate memory from the operating system and then manages the heap for the Java application. Jemalloc is another malloc implementation, and has a great tool called jeprof which does memory allocation profiling which allows us to visually trace what is calling malloc.
Beauty of jemalloc - jemalloc supports threads natively with little memory fragmentation.
How jemalloc is different from malloc?
jemalloc vs standard malloc
When we use jemalloc, performance decrease, but memory "fragmentation" decreases as well. Jemalloc also seems to use less memory on the peak, but difference is 5-6%.
What I mean with memory fragmentation is as follows.
- First allocate lots of key value pairs (5-7 GB of memory)
- Then look at the memory usage.
- Then deallocate all pairs and any other memory my executable uses. Order of allocation is same as order of deallocation.
- Finally check memory usage again.
In standard malloc, usage is almost like on the peak.
Conclusion - With jemalloc usage is minimal.
- Each thread also uses a thread-local cache without locks at <32KB.
- Jemalloc uses the following size-class classification on 64bits systems:
Small: , [16, 32, 48, …, 128], [192, 256, 320, …, 512], [768, 1024, 1280, …, 3840]
Large: [4 KiB, 8 KiB, 12 KiB, …, 4072 KiB]
Huge: [4 MiB, 8 MiB, 12 MiB, …]
- Small/large objects need constant time to find metadata, and huge objects are searched in logarithmic time through global red-black trees.
- The virtual memory is logically divided into chunks (the default is 4MB, 1024 4k pages). The application thread allocates arena at the first malloc through the round-robin algorithm. Each arena is independent of each other and maintains its own chunks. The chunk cuts the pages to the small/large object. The free() memory is always returned to the associated arena, regardless of which thread calls free().
What are the benefits?
- The main benefit is scalability in multi-processor and multi-threaded systems achieved, in part, by using multiple arenas (the chunks of raw memory from which allocations are made).
- jemalloc to get to the bottom of a memory leak
- jemalloc really helped Aerospike take advantage of modern multithreaded, multi-CPU, multi-core computer architectures. There are also some very important debugging capabilities built in to jemalloc to manage arenas. The debugging allowed Psi to be able to tell, for instance, what was a true memory leak, versus what was the result of memory fragmentation.
Source - engineering.fb.com
Now, it's time for bonus information.
Bonus information - tcmalloc
Tcmalloc is a memory management library open sourced by Google as a replacement for glibc malloc. Currently used in chrome, safari and other well-known software.
According to the official test report, ptmalloc requires about 300 nanoseconds to execute malloc and free on a 2.8GHz P4 machine (for small objects). The same operation of the TCMalloc version takes only about 50 nanoseconds.
tcmalloc vs jemalloc - Jemalloc is slightly more performant in general, as well as creating less heap fragmentation over time.
Now it's time to have fun by answering a simple question.
Which is better among these in accordence with performance comparision?
Useful Links and References
- To install jemalloc Click here
Then run the below commands.
./configure --with-jemalloc-prefix='je_' --with-malloc-conf='background_thread:true,metadata_thp:auto' make sudo make install
- This paper investigates the performance of different memory allocators.