Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we have covered Memory allocation in Python in depth along with types of allocated memory, memory issues, garbage collection and others.
Table of contents
- What is memory allocation?
- How Python memory allocation works?
- Good python memory management practices.
- Python memory issues
What is memory allocation?
It is a process by which a block of memory in computer memory is allocated for a program. Understanding memory allocation is key to writing fast and efficient programs irrespective of the huge amounts of memory computers tend to have nowadays.
How Python memory allocation works?
Prior to the subsequent chapters, it is important to understand that everything in python is an object.
Its no suprise that this might be obscure to most of us as python developers.
Well, thats because, memory allocation (a subset of memory management) is automatically done for us. This means you wont see malloc and free functions (familiar to C programmers) scattered through a python application.
Memory management in python is done by the Python Memory Manager(Part of the interpreter)
To gracefully handle memory management, the python memory manager uses the reference count algorithm. Basically it keeps track of the count of the references to every block of memory allocated for the program.
References are basically variables we use in our programs. They are references to block(s) of memory.
# Remember 5 is object of class <int>
# So variables(references) x, y, z are all referring to the same object in memory
x = 5
y = x
z = x
# Assiging new values to the variables
x = None
y = None
z = None
# As we re-assign x to None, the reference count decrements by one and the same happens
# with the rest of the re-assignments
The python interpreter has a Garbage Collector that deallocates previously allocated memory if the reference count to that memory becomes zero.
Memory types
Heap memory
Python uses a private heap that stores all python objects and data structurers. We as developers have zero control over the private heap, however, there are ways to optimize the memory efficiency of our programs.
arr = [1, 2, 3] # Stored on the heap
def func():
a = 10 # The value (object) 10 is stored on the heap and the function stack frame will only hold a reference to that value.
pass
To optimize memory management, the heap is further subdivided:
Arenas
Python has a couple of memory allocators and each has been optimized for a specific situation i.e. allocation for small and large objects. One of them is pymalloc that is optimized for small objects (<= 512B). pymalloc returns an arena. An arena is a memory mapping with a fixed size of 256 KiB (KibiBytes). pymalloc uses the C malloc() function to allocate pools of memory which it then uses to handle subsequent memory requests.
In a nutshell an arena is used to service memory requests without having to reallocate new memory.
Pools
With in arenas, we have pools that take the size of the Operating System page size but by default, python assumes the page size to be 4KB.
Pools are fragmented into blocks and each pool is composed of blocks that corresspond to the same size class depending of how much memory has been requested.
Each pool has freeblock pointer (singly linked list) that points to the free blocks in a pool.
Pools can have 3 states.
used: The pool has available blocks of data.
full: All the pool's blocks have been allocated and contain data.
empty: The pool has no data and can be assigned any size class for blocks when requested.
Blocks
Perhaps we have hinted about blocks in the preceeding paragraphs, but to add on to that, blocks can have 3 states.
untouched: Has not been allocated
free: Block was allocated but freed and it now contains irelevant data
allocated: Has been allocated and contains relevant data
Stack memory
This memory space is allocated for only function calls. It holds references to the function's local variables (arguments are also inclusive).
In the preceeding statement I stressed the word references because the actual values are stored in the private heap.
The stack is Last In First Out (LIFO) data structure i.e. the last item to go in to the stack is the first item to get out. I hope you get some bit of how recursion works (A pile of stack frames).
When the function is invoked, a stack frame is allocated, and when the function returns or exits, the stack frame is destroyed.
Good python memory management practices
The essence of good memory management is utilize less but enough memory so that our programs can run alongside other programs. You can optimize your python program's memory usage by adhering to the following:
a. Avoid list slicing.
arr = [1, 2, 3, 4, 4, 6]
new_arr = arr[2:] # O(M) where M is the length of sliced array
# Instead use indicies to track the list portions you need
first_pos = 2
last_pos = 5
b. Avoid string concatenation.
s = "Hello"
new_s = s + " World!" # O(N + M)Highly costly for even large strings
# Where N and M are lengths of two strings being concatenated.
# Use join()
new_str = ''.join(list(s)) # join() is O(N) where N is the length of the string to be concatenated, In our case it is <s>.
Python memory issues
Consequently, under certain circumstances, the Python memory manager may or may not trigger appropriate actions, like garbage collection, memory compaction or other preventive procedures.
*From the Python 3 Memory Management Documentation
Due to the python memory manager failing to clear memory at certain times, the performance of a program is degraded as some unused references are not freed. This is known as a memory leak.
To fix memory leaks, we can use tracemalloc, an inbuilt module introduced in python 3.4.
The software domain has shifted to writing optimal code that works rather than just code that works. Writing software while taking into account its efficacy at solving the intented problem enables us to visualize the software's limits.