Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Given that integers are being read from a data stream, we need to find the median of all the elements read so far starting from the first integer till the last integer. This is called the Median in Running stream of Integers.
Table of Contents:
- How do we define median?
- Approach using Insertion Sort
- Approach using Heap data structure
- Approach using Ordered Multiset Data Structure
- Conclusion
How do we define median?
Median can be defined as the element in the data set which separates the higher half of the data sample from the lower half. When the size of input data is odd, the median of input data is the middle element of sorted input data. When the size of the input data is even, the median is the average middle two elements in sorted input data.
In this article, we shall discuss the various approaches to find the median in stream of running integers:
- using insertion sort
- using a heap data structure
- using ordered multiset data structure
For example:
Input: 5 15 1 3 2 8
Output: 5 10 5 4 3 4
Explaination:
Iteration 1 :
Elements = {5}
Median = 5
Iteration 2 :
Elements = {5,15}
Median = (5+15)/2 = 10
Iteration 3 :
Elements = {1,5,15}
Median = 5
Iteration 4 :
Elements = {1,3,5,15}
Median = (3+5)/2 = 4
Iteration 5 :
Elements = {1,2,3,5,15}
Median = 3
Iteration 6 :
Elements = {1,2,3,5,8,15}
Median = (3+5)/2 = 4
Let us now explore the various approaches one by one.
Approach using Insertion Sort
The idea is to keep elements received from the stream in sorted order. When we receive a new element from the stream, we find it’s correct place in the sorted order and place the new element at the correct place using insertion sort and then find the median of the newly obtained sorted order. We do this for every element obtained from the running stream of integers.
Algorithm/ Step:
- When a new element comes in, insert new element in array using Insertion Sort.
- In the sorted array, find the middle element as median.
Implementation:
#include <iostream>
#include <vector>
using namespace std;
void insertionSort(vector <int>& sortedElements)
{
int last_element = sortedElements.size()-1;
while(last_element > 0 && sortedElements[last_element] < sortedElements[last_element-1])
{
swap(sortedElements[last_element-1],sortedElements[last_element]);
last_element--;
}
}
void printMedian(vector <int> stream)
{
vector <int> sortedElements;
for(int i=0;i<stream.size();i++)
{
sortedElements.push_back(stream[i]);
if(sortedElements.size() == 1)
cout<<sortedElements[0]<<" ";
else
{
insertionSort(sortedElements);
if(sortedElements.size()%2 == 1)
{
int mid = sortedElements.size()/2;
cout<<sortedElements[mid]<<" ";
}
else
{
int mid1 = (sortedElements.size()-1)/2;
int mid2 = sortedElements.size()/2;
cout<<(float)(sortedElements[mid1]+sortedElements[mid2])/2<<" ";
}
}
}
}
int main()
{
vector <int> stream = {5,15,1,3,2,8};
printMedian(stream);
return 0;
}
Complexity Analysis:
Time complexity: O(n^2)
Insertion sort takes O(n^2) time to sort n elements. Even if we use binary search to find the correct place of elements requiring O(log n) time but to move the data movement will again take O(n) time and for n elements this will require O(n^2) time.
Space Complexity: O(1)
We do not require any extra space other than a vector to store the input stream, therefore space complexity is constant.
Approach using Heap data structure
The idea is to use max heap and min-heap data structure to store the elements of the lower half and higher half respectively.
We use a max heap to represent elements that are less than effective median, and a min heap to represent elements that are greater than effective median.
After inserting an element in either of the heaps, the number of elements in heaps differ utmost by 1 element. When the heaps are not balanced, we select median from the root of heap containing more elements and when both heaps contain same number of elements, we calculate the average of heaps root data to be the median.
Let us now see the algorithm to find the median in running stream of integers using this heap data structure.
Algorithm/ Steps:
- We create two heaps max heap and min-heap data structure to store the elements of the lower half and higher half respectively at any point of time.
- We take initial value of median as 0.
- For every new element we read, we insert it into either max heap or min-heap and calculate the median.
- If the size of both the heaps is the same then, If the current element is greater than the median value,we insert it into min heap and return the root of minheap as the new median. Else if the current element is less than the median value, we insert it into max heap and return the root of maxheap as the new median.
- Else If the size of maxheap is greater than minheap then, if the current element is greater than the median, we insert the current element in minheap. Else if the current element is less than the median, we pop the root of maxheap and insert it into minheap. Now we insert the current element to maxheap and calculate the median as an average of the root of minheap and maxheap.
- Else If the size of maxheap is less than minheap then, if the current element is less than the median, we insert the current element into maxheap. Else if the current element is greater than the median, we pop the top of minheap and insert it into maxheap. Now we insert the current element to minheap and calculate the median as an average of the root of minheap and maxheap.
#include<iostream>
#include<vector>
#include<queue>
using namespace std;
void printMedians(double arr[], int n)
{
// max heap to store the smaller half elements
priority_queue<double> smaller;
// min heap to store the greater half elements
priority_queue<double,vector<double>,greater<double> > greater;
double median = arr[0];
smaller.push(arr[0]);
cout << median << " ";
for (int i=1; i < n; i++)
{
double current = arr[i];
// if max heap has more elements
if (smaller.size() > greater.size())
{
if (current < median)
{
greater.push(smaller.top());
smaller.pop();
smaller.push(current);
}
else
greater.push(current);
median = (smaller.top() + greater.top())/2.0;
}
// if both heaps are balanced
else if (smaller.size()==greater.size())
{
if (current < median)
{
smaller.push(current);
median = (double)smaller.top();
}
else
{
greater.push(current);
median = (double)greater.top();
}
}
// if min heap has more elements
else
{
if (current > median)
{
smaller.push(greater.top());
greater.pop();
greater.push(current);
}
else
smaller.push(current);
median = (smaller.top() + greater.top())/2.0;
}
cout << median << " ";
}
}
int main()
{
double arr[] = {5,15,1,3,2,8};
int n = sizeof(arr)/sizeof(arr[0]);
printMedians(arr, n);
return 0;
}
Complexity Analysis:
Time Complexity: O(n log n).
Time Complexity to insert an element in min heap is O(log n). So to insert n elements time complexity is O(n log n).
Space Complexity :O(n).
Here we require two auxiliary heaps to process the elements. The Space required to store the elements in Heap is O(n) so the space complexity is O(n).
Approach using Ordered Multiset Data Structure
In this approach we use a multiset data structures with two pointers left and right. when we insert an element into the multiset, we modify these pointers to point at the middle element of the sorted stream. When the size of the multiset is even, left and right pointers point to the same element that is the middle element and when the size of the mutiset is odd, left and right pointers point to consecutive elements in the middle of the array.
Let us now see the algorithm to find the median in running stream of integers using this ordered multiset data structure.
Algorithm/ Steps:
- We create a multiset data structure.
- We now create two iterators for multiset left and right.
- Now process current element of the stream and insert the element into the multiset sorted. We do this for every element in the stream.
- If the size of sorted is 1 (the first element is inserted), left and right point to the first element of sorted.
- Else, we move to step 6 or 7 depending on the condition.
- If the size of multiset sorted is even, left and right point to the same element in the middle. Now if the current element is greater than the element pointed by right, we advance right to next element, else if the current element is less than element pointed by right,we move back left to the previous element.
- If the size of mutiset sorted is odd, left and right point to consecutive elements in the middle of the array.Now if the current element is greater than the element pointed by right, we advance left to next element, else if the current element is less than element pointed by left, we move right to the previous element, else if the current element is greater than left and less than right, we move left forward and right to backward.
- Lastly we calculate median at every step where median = (left+right)/2.
Implementation:
#include <iostream>
#include <bits/stdc++.h>
using namespace std;
void printMedian(vector <int> stream)
{
multiset <int> sorted;
multiset<int>::iterator left,right;
float median = 0;
for(int i=0;i<stream.size();i++)
{
sorted.insert(stream[i]);
if(sorted.size() == 1)
{
left = sorted.begin();
right = sorted.begin();
median = *left;
}
else
{
if(sorted.size()%2 == 0)
{
if(stream[i] >= *right)
right++;
else
left--;
}
else
{
if(stream[i] >= *right)
left++;
else if(stream[i] <= *left)
right--;
else
{
left++;
right--;
}
}
}
median = (float)(*left+*right)/2;
cout<<median<<" ";
}
}
int main()
{
vector <int> stream = {5,15,1,3,2,8};
printMedian(stream);
return 0;
}
Complexity Analysis:
Time complexity : O(n logn)
Time Complexity to insert an element in a multiset is O(log n). So to insert n elements time complexity is O(n log n).
Space Complexity : O(n)
Here we require a multiset to process the elements. The Space required to store the elements in a multiset is O(n) so the space complexity is O(n).
Conclusion
So now, we have discussed all the three approaches to find the median in running stream of integers and also we have have seen that the minimum time complexity for this is O(n logn) and minimum space complexity is O(n), when we use heap or an ordered multiset data structure.
With this article at OpenGenus, you must have the complete idea of how to find the median in running stream of integers.