Any garbage collection algorithm has three basic steps:
Find all objects that has the possibility of being used in future (called Mark Phase)
Remove all unused objects (called Sweep or Copy phase)
Any later processing (involves Compact phase)
Our objective is to identify all live (reachable) objects.
Every Garbage Collection algorithm used in Java Virtual Machine starts by finding out all objects that are still alive.
First, Garbage Collection algorithm defines some specific objects as Garbage Collection Roots (GC Roots).
Examples of such Garbage Collection roots are:
- Local variable and input parameters of the currently executing methods
- Active threads
- Static field of the loaded classes
- JNI references
Garbage Collection traverses the whole object graph in memory, starting from those Garbage Collection Roots and following references from the roots to other objects. Every object the Garbage Collection visits is marked as alive.
When the marking phase finishes, every live object is marked.
All other objects are unreachable from the Garbage Collection roots, implying that your application cannot use the unreachable objects anymore. Such objects are considered garbage and Garbage Collection algorithm should get rid of them in coming phases.
Key points regarding the marking phase are:
When the application threads are temporarily stopped so that the JVM can indulge in housekeeping activities is called a safe point resulting in a Stop The World pause.
Safe points can be triggered for different reasons but garbage collection is by far the most common reason for a safe point to be introduced.
The duration of this pause does not depend on the total number of objects in heap nor or the size of the heap
The duration of this pause depends on the number of alive objects.
Increasing the size of the heap does not directly affect the duration of the marking phase.
When the mark phase is completed, the GC can proceed to the next step and start removing the unreachable objects.
Removing Unused Objects
The strategy of removal of unused objects is different for different Garbage Collection algorithms.
The strategies can be divided into three groups:
Mark and Sweep algorithms use conceptually the simplest approach to garbage by just ignoring such objects. What this means is that after the marking phase has completed all space occupied by unvisited objects is considered free and can thus be reused to allocate new objects.
The approach requires using the so called free-list recording of every free region and its size. The management of the free-lists adds overhead to object allocation. Built into this approach is another weakness – there may exist plenty of free regions but if no single region is large enough to accommodate the allocation, the allocation is still going to fail (with an OutOfMemoryError in Java).
Mark-Sweep-Compact algorithms solve the shortcomings of Mark and Sweep by moving all marked – and thus alive – objects to the beginning of the memory region. The downside of this approach is an increased GC pause duration as we need to copy all objects to a new place and to update all references to such objects. The benefits to Mark and Sweep are also visible – after such a compacting operation new object allocation is again extremely cheap via pointer bumping. Using such approach the location of the free space is always known and no fragmentation issues are triggered either.
Mark and Copy algorithms are very similar to the Mark and Compact as they too relocate all live objects. The important difference is that the target of relocation is a different memory region as a new home for survivors. Mark and Copy approach has some advantages as copying can occur simultaneously with marking during the same phase. The disadvantage is the need for one more memory region, which should be large enough to accommodate survived objects.