In this post, we explained the median of medians heuristic, its applications and usefulness as well as its limitations. Median of Medians Algorithm is a Divide and Conquer algorithm.
Table of contents:
- What is median of medians heuristic?
- Why you may need "Median of Medians"?
- Median of Medians Algorithm
- Time and Space Complexity of Median of Medians Algorithm
- Proofs: Why is this pivot good?
Let us get started with Median of Medians Algorithm.
What is median of medians heuristic?
It is an approximate median selection algorithm that helps in creating asymptotically optimal selection algorithms by producing good pivots that improve worst case time complexities for sorting and selection algorithms.
Why you may need "Median of Medians"?
Some applications of the median of medians heuristic include the following;
Quickselect selects the kth smallest element of an initially unsorted array, it worst case running time is quadratic, when median of medians heuristic is implemented it finds an approximate median which is used as pivot and the worst case time complexity becomes linear.
Quicksort relies on a good pivot element for its performance, the best known approach for finding a pivot is using a randomized pivot element, the running time on average is linear but it becomes quadratic in the worst case.
Using median of medians proves useful in making its worst case O(nlogn).
Unfortunately, implementing this heuristic in Quicksort will actually make it perform a-lot less efficient when compared to the normal randomized pivot selection for most cases.
Introsort on the other hand is a hybrid sorting algorithm that uses both quick sort and the median of medians heuristic to give a fast average performance and an optimal worst case performance, It uses randomized quick sort at the start of the algorithm then based on the pivots thus far selected, it chooses to use the median of medians heuristic to find a good pivot making it asymptotically optimal with O(nlogn) time in the worst case. Introsort is used as a sorting algorithm in c++ stl.
Similarly, introselect uses quickselect and median of medians to select a good pivot at each iteration until a kth element is found.
Median of Medians Algorithm
It is a divide and conquer algorithm in that, it returns a pivot that in the worst case will divide a list of unsorted elements into sub-problems of size and assuming we choose a sublist size of 5.
It guarantees a good pivot that in the worst case will give a pivot in the range between 30th and 70th percentile of the list of size n.
For a pivot to be considered good it is essential for it to be around the middle, 30-70% guarantees the pivot will be around the middle 40% of the list.
The algorithm is as follows,
- Divide the list into sublists if size n, assume 5.
- Initialize an empty array M to store medians we obtain from smaller sublists.
- Loop through the whole list in sizes of 5, assuming our list is divisible by 5.
- For sublists, use select brute-force subroutine to select a median m, which is in the 3rd rank out of 5 elements.
- Append medians obtained from the sublists to the array M.
- Use quickSelect subroutine to find the true median from array M, The median obtained is the viable pivot.
- Terminate the algorithm once the base case is hit, that is, when the sublist becomes small enough. Use Select brute-force subroutine to find the median.
Note: We used chunks of size 5 because selecting a median from a list whose size is an odd number is easier. Even numbers require additional computation. We could also select 7 or any other odd number as we shall see in the proofs below.
Here is the procedure pictorially;
Pseudocode of Median of Medians Algorithm
medianOfMedians(arr[1...n]) if(n < 5 return Select(arr[1...n], n/2)) Let M be an empty list For i from 0 to n/5 - 1: Let m = Select(arr[5i + 1...5i+5], 3) Add m to M Return QuickSelect(M[1...n/5], n/10) End medianOfMedians
It is recursive, it calls QuickSelect which in turn will call MedianOfMedians.
Time and Space Complexity of Median of Medians Algorithm
This algorithm runs in O(n) linear time complexity, we traverse the list once to find medians in sublists and another time to find the true median to be used as a pivot.
The space complexity is O(logn) , memory used will be proportional to the size of the lists.
Proofs: Why is this pivot good?
- Lemma: The median of medians will return a pivot element that is greater than and less than at least 30% of all elements in the whole list.
Array M consists of medians of sub lists of size 5, these elements in list M is greater than and less than at-least two elements in the original list.
QuickSelect will return a true median that represents the whole list which is greater than and less than elements of list M and since each one of the M elements is greater than and less than at least two other elements in their previous sublists, therefore the true median is greater than and less than at least , 30 percentile of elements of the whole list.
- Lemma: With that, we guarantee that quickselect finds a good pivot in linear time O(n) which in turn guarantees quickSort's worst case to be O(nlogn).
n = 1
The above proof worked because ,we split the original list in chunks of 5 assuming the original list is divisible by 5. We could also use another odd number provided the above equation results in a number below 1, then our theorem will perform its operations in O(n) linear time.
Therefore we get a big theta(n) time complexity for QuickSelect which proves using this heuristic for QuickSelect ad QuickSort improves worst case to O(n) and O(nlogn) for the respective algorithms.
In this post at OpenGenus, we explained introsort uses median of medians heuristic to improve the worst case running time for quicksort, Is there another way to improve quicksort worst case run time using another hybrid approach?