Mean of Array problem solved with Treap

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article at OpenGenus, we have solved the Mean of Array problem using Treap Data Structure.

Pre-requisites

Maths
Treaps

Problem Statement

Here you are given an array, the aim is to find the number of sub-arrays in which the mathematical mean is not less than given k. Mean of an array is the average of that array. Mean is equal to the sum of all numbers in an array divided by the number of elements in the array.

In the input you will be given n the number of elements in the array, k the boundary of the mean and the array itself.

In the output print the number of sub-arrays with mean less than k.

All the numbers in the input section are 32-bit positive integers and N ≤ 200000.

Example

Input:
5 4
5 2 4 5 1

Output:
5

Observation

The observations made from the question are:

We need to find the number of sub-arrays in which the mathematical mean is not less than the given value k.
The mean is defined as the sum of the elements in an array divided by the number of elements.
The input contains the number of elements in the array as n, the value of k, and n integers representing the elements of the array.
The size of the array can be as large as 2,00,000, which implies that the algorithm should be efficient.
All the numbers in the input section are 32-bit positive integers.
We need to use some mathematical concepts to solve the problem efficiently.

Brute force approach

Here in the brute force approach to solve this problem would be to consider all possible sub-arrays of the given array and calculate the mean of each sub-array. Then count the sub-arrays in which mean is greater than or equal to k.

To achieve this we would need two nested loops to generate all the possible sub-arrays of the given array and calculate the sum of the sub-array using another loop and divide the sum by the length of the sub-array to calculate the mean.

The time complexity of this expression would be O(n^3) where n is the size of the given array as we have three nested loops. The space complexity would be O(1) as we are not using any additional data structures.

Approach

Here to get the optimized approach we will be using treap data structure.
Intuitive steps

The intuitive approach is as follows:

First, a node structure is defined, which represents a node in the treap.
We will have a
- function sz to get the size of the treap.
- function upd_sz to update the size of the treap.
- function init to initialize node in the treap.
- function split to split the treap t into two traps l and r such that all elements in l are less than or equal to key, and all the elements in r are greater than key.
- function merge is used to merge two treaps l and r into a single treap t.
- function insert is used to insert a node it into the treap.
- function sma_and equa funtion is used to get the number of elements in the treap t that are less than or equal to x.
Now in the main function the value of k and n are taken as input
The root node is initialized to NULL.
A node with the value of 0 is inserted into the treap root.
Variable pref and res are initialized to 0.
Loop is executed to read the values to the array.
The value of the val variable is subtracted from k.
The value of val is added to pref.
The number of elements in the treap root that are less than or equal to pref is added to res.
A new node with a value of pref is inserted into the treap root.
After the loop, the value of res is printed.

Explained
To implement this optimally we will use treaps data structure for this solution.
First, a treap is a binary search tree that is also a heap. It satisfies the binary search tree property in which the left subtree of a node contains only nodes with keys less than the node's key, and the right subtree contains only nodes with keys greater than the node's key. It satisfies the heap property where each node's key is smaller than the keys of its children. A treap is built by associating a random priority with each node and maintaining the heap property based on the priorities. The priority is generated randomly when the node is inserted into the treap and is not changed thereafter.

In the intuitive approach we have discussed about the duties of each function. So now lets look how everything works together

First we will take in the values n and k which are the size of the array and the desired minimum mean, respectively. Then it initializes a treap called root to NULL using the init(), which creates a new node with the given value, priority, size, left and right children and returns a pointer to that node.
After that we insert the new node into the root using the insert(), which inserts the node at the correct position in the treap by recursively calling itself on the left or right subtree depending on the value of the node's val compared to the current node's val. It also updates the size of each node as it is added to the treap.
Now we will enter the loop to read each element of the array one by one. For each element we subtract k from it and add the resultant to the variable pref, which is the running prefix sum.
Now we will search for the number of sub-arrays with a mean greater than or equal to k using the sma_and_equa(), which recursively splits the treap into two subtrees based on the val of each node, and returns the size of the left subtree. It then merges the two subtrees back together and returns the size of the left subtree as the result.
Then it adds the result to a variable called res, which represents the running count of sub-arrays with a mean greater than or equal to k. It then inserts a new node into the treap representing the current prefix sum, and repeats the process for the next element of the array.
Once all elements of the array have been processed, the program outputs the final value of res, which represents the total number of sub-arrays with a mean greater than or equal to k.

Code
The following is the C++ implementation of the above approach:

#include <bits/stdc++.h>
using namespace std;

typedef long long ll;

struct node {
    ll val, prior, size;
    node *l, *r;
};

ll sz(node* t) {
    return t ? t->size : 0;
}

void upd_sz(node* t) {
    if (t) t->size = sz(t->l) + 1 + sz(t->r);
}

node* init(ll val) {
    node* ret = new node();
    ret->val = val;
    ret->size = 1;
    ret->prior = rand();
    ret->l = ret->r = NULL;
    return ret;
}

void split(node* t, node* &l, node* &r, ll key) {
    if (!t) l = r = NULL;
    else if (t->val <= key) split(t->r, t->r, r, key), l = t;
    else split(t->l, l, t->l, key), r = t;
    upd_sz(t);
}

void merge(node* &t, node* l, node* r) {
    if (!l || !r) t = l ? l : r;
    else if (l->prior > r->prior) merge(l->r, l->r, r), t = l;
    else merge(r->l, l, r->l), t = r;
    upd_sz(t);
}

void insert(node* &t, node* it) {
    if (!t) t = it;
    else if (it->prior > t->prior) split(t, it->l, it->r, it->val), t = it;
    else insert(t->val <= it->val ? t->r : t->l, it);
    upd_sz(t);
}

ll sma_and_equa(node* &t, ll x) {
    node *l, *r;
    split(t, l, r, x);
    ll nw = sz(l);
    merge(t, l, r);
    return nw;
}

int main() {
    srand(time(NULL));
    ll k, n, val;
    scanf("%lld%lld", &n, &k);
    node* root = NULL;
    insert(root, init(0ll));
    ll pref = 0, res = 0;
    for (int i = 0; i < n; ++i) {
        scanf("%lld", &val);
        val -= k;
        pref += val;
        res += sma_and_equa(root, pref);
        insert(root, init(pref));
    }
    printf("%lld\n", res);
    return 0;
}

If in the input we give n = 5 and k = 4 and if the array is [5, 2, 4, 5, 1] then the output will be:

Output

This is because there are 5 sub arrays which satisfies the condition. Those are

| Sub-array | Mean |
--------------------
|    [5]    |  5   |
| [5,2,4,5] |  4   |
|    [4]    |  4   |
|    [5]    |  5   |
|   [4, 5]  |  5   |

All the above means are equal to or greater than k which is 4. The rest of the sub arrays formed have mean less that 4 which is the value of k The count of such sub-arrays whose mean is not less than k is 5, hence the output.

Complexities

The time complexity of the code is O(n log n), where n is the number of elements in the input array. This is because we are using a treap to maintain a sorted sequence of prefix sums. The insertion and splitting operations in the treap take O(log n) time in the worst case, and we perform them for each of the n elements in the input array.

The space complexity is O(n) as we are using a treap to store the prefix sums of the input array, and the size of the treap can be as large as the input array itself. We are also using a constant amount of extra space for variables used in the code, such as variables to store the input values and the treap nodes.

Overall, the time complexity is efficient and a reasonable space complexity which makes this a good solution.

Mean of Array problem solved with Treap

Data Structures Algorithms tree data structure

Problem Statement

Observation

Brute force approach

Approach

Complexities

Implementing Merge Sort in Java

Ghost Town problem solved with Treap Data Structure