Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article at OpenGenus, we have solved the Mean of Array problem using Treap Data Structure.
Pre-requisites
- Maths
- Treaps
Problem Statement
Here you are given an array, the aim is to find the number of sub-arrays in which the mathematical mean is not less than given k
. Mean of an array is the average of that array. Mean is equal to the sum of all numbers in an array divided by the number of elements in the array.
In the input you will be given n
the number of elements in the array, k
the boundary of the mean and the array itself.
In the output print the number of sub-arrays with mean less than k
.
All the numbers in the input section are 32-bit
positive integers and N ≤ 200000
.
Example
Input:
5 4
5 2 4 5 1
Output:
5
Observation
The observations made from the question are:
- We need to find the number of sub-arrays in which the mathematical mean is not less than the given value
k
. - The mean is defined as the sum of the elements in an array divided by the number of elements.
- The input contains the number of elements in the array as
n
, the value ofk
, andn
integers representing the elements of the array. - The size of the array can be as large as
2,00,000
, which implies that the algorithm should be efficient. - All the numbers in the input section are
32-bit
positive integers. - We need to use some mathematical concepts to solve the problem efficiently.
Brute force approach
Here in the brute force approach to solve this problem would be to consider all possible sub-arrays of the given array and calculate the mean of each sub-array. Then count the sub-arrays in which mean is greater than or equal to k
.
To achieve this we would need two nested loops to generate all the possible sub-arrays of the given array and calculate the sum of the sub-array using another loop and divide the sum by the length of the sub-array to calculate the mean.
The time complexity of this expression would be O(n^3)
where n
is the size of the given array as we have three nested loops. The space complexity would be O(1)
as we are not using any additional data structures.
Approach
Here to get the optimized approach we will be using treap data structure.
Intuitive steps
The intuitive approach is as follows:
- First, a node structure is defined, which represents a node in the treap.
- We will have a
- function
sz
to get the size of the treap. - function
upd_sz
to update the size of the treap. - function
init
to initialize node in the treap. - function
split
to split the treapt
into two trapsl
andr
such that all elements inl
are less than or equal to key, and all the elements inr
are greater than key. - function merge is used to merge two treaps
l
andr
into a single treapt
. - function insert is used to insert a node
it
into the treap. - function
sma_and equa
funtion is used to get the number of elements in the treapt
that are less than or equal tox
.
- function
- Now in the main function the value of
k
andn
are taken as input - The root node is initialized to
NULL
. - A node with the value of
0
is inserted into the treaproot
. - Variable
pref
andres
are initialized to0
. - Loop is executed to read the values to the array.
- The value of the
val
variable is subtracted fromk
.
The value ofval
is added topref
.
The number of elements in the treaproot
that are less than or equal topref
is added tores
.
A new node with a value ofpref
is inserted into the treaproot
.
After the loop, the value ofres
is printed.
Explained
To implement this optimally we will use treaps data structure for this solution.
First, a treap is a binary search tree that is also a heap. It satisfies the binary search tree property in which the left subtree of a node contains only nodes with keys less than the node's key, and the right subtree contains only nodes with keys greater than the node's key. It satisfies the heap property where each node's key is smaller than the keys of its children. A treap is built by associating a random priority with each node and maintaining the heap property based on the priorities. The priority is generated randomly when the node is inserted into the treap and is not changed thereafter.
In the intuitive approach we have discussed about the duties of each function. So now lets look how everything works together
- First we will take in the values
n
andk
which are the size of the array and the desired minimum mean, respectively. Then it initializes a treap calledroot
toNULL
using theinit()
, which creates a new node with the given value, priority, size, left and right children and returns a pointer to that node. - After that we insert the new node into the root using the
insert()
, which inserts the node at the correct position in the treap by recursively calling itself on the left or right subtree depending on the value of the node'sval
compared to the current node'sval
. It also updates the size of each node as it is added to the treap. - Now we will enter the loop to read each element of the array one by one. For each element we subtract
k
from it and add the resultant to the variablepref
, which is the running prefix sum. - Now we will search for the number of sub-arrays with a mean greater than or equal to
k
using thesma_and_equa()
, which recursively splits the treap into two subtrees based on theval
of each node, and returns the size of the left subtree. It then merges the two subtrees back together and returns the size of the left subtree as the result. - Then it adds the result to a variable called
res
, which represents the running count of sub-arrays with a mean greater than or equal tok
. It then inserts a new node into the treap representing the current prefix sum, and repeats the process for the next element of the array. - Once all elements of the array have been processed, the program outputs the final value of
res
, which represents the total number of sub-arrays with a mean greater than or equal tok
.
Code
The following is the C++ implementation of the above approach:
#include <bits/stdc++.h>
using namespace std;
typedef long long ll;
struct node {
ll val, prior, size;
node *l, *r;
};
ll sz(node* t) {
return t ? t->size : 0;
}
void upd_sz(node* t) {
if (t) t->size = sz(t->l) + 1 + sz(t->r);
}
node* init(ll val) {
node* ret = new node();
ret->val = val;
ret->size = 1;
ret->prior = rand();
ret->l = ret->r = NULL;
return ret;
}
void split(node* t, node* &l, node* &r, ll key) {
if (!t) l = r = NULL;
else if (t->val <= key) split(t->r, t->r, r, key), l = t;
else split(t->l, l, t->l, key), r = t;
upd_sz(t);
}
void merge(node* &t, node* l, node* r) {
if (!l || !r) t = l ? l : r;
else if (l->prior > r->prior) merge(l->r, l->r, r), t = l;
else merge(r->l, l, r->l), t = r;
upd_sz(t);
}
void insert(node* &t, node* it) {
if (!t) t = it;
else if (it->prior > t->prior) split(t, it->l, it->r, it->val), t = it;
else insert(t->val <= it->val ? t->r : t->l, it);
upd_sz(t);
}
ll sma_and_equa(node* &t, ll x) {
node *l, *r;
split(t, l, r, x);
ll nw = sz(l);
merge(t, l, r);
return nw;
}
int main() {
srand(time(NULL));
ll k, n, val;
scanf("%lld%lld", &n, &k);
node* root = NULL;
insert(root, init(0ll));
ll pref = 0, res = 0;
for (int i = 0; i < n; ++i) {
scanf("%lld", &val);
val -= k;
pref += val;
res += sma_and_equa(root, pref);
insert(root, init(pref));
}
printf("%lld\n", res);
return 0;
}
If in the input we give n = 5
and k = 4
and if the array is [5, 2, 4, 5, 1]
then the output will be:
Output
5
This is because there are 5 sub arrays which satisfies the condition. Those are
| Sub-array | Mean |
--------------------
| [5] | 5 |
| [5,2,4,5] | 4 |
| [4] | 4 |
| [5] | 5 |
| [4, 5] | 5 |
All the above means are equal to or greater than k
which is 4
. The rest of the sub arrays formed have mean less that 4
which is the value of k
The count of such sub-arrays whose mean is not less than k
is 5
, hence the output.
Complexities
The time complexity of the code is O(n log n)
, where n
is the number of elements in the input array. This is because we are using a treap to maintain a sorted sequence of prefix sums. The insertion and splitting operations in the treap take O(log n)
time in the worst case, and we perform them for each of the n
elements in the input array.
The space complexity is O(n)
as we are using a treap to store the prefix sums of the input array, and the size of the treap can be as large as the input array itself. We are also using a constant amount of extra space for variables used in the code, such as variables to store the input values and the treap nodes.
Overall, the time complexity is efficient and a reasonable space complexity which makes this a good solution.