Majority Element using randomized algorithm

Do not miss this exclusive book on Binary Tree Problems. Get it now for free.

This article at OpenGenus introduces a randomized algorithm for finding the majority element. On average it has the same runtime, but when the size of the array increases exponentially, this algorithm is much much faster probabilistically.

The majority element of a vector is an element which occurs with a frequency more than floor(n/2). So you have more than a 50% chance of picking out the majority element in your first try if you pick any element randomly.

Because a given index is likely to have the majority element, we can just select a random index, check whether its value is the majority element, return if it is, and repeat if it is not. The algorithm is verifiably correct because we ensure that the randomly chosen value is the majority element before ever returning.

It is technically possible for this algorithm to run indefinitely (if we never manage to randomly select the majority element), so the worst possible runtime is unbounded. However, the expected runtime is far better - linear, in fact. For ease of analysis, convince yourself that because the majority element is guaranteed to occupy more than half of the array, the expected number of iterations will be less than it would be if the element we sought occupied exactly half of the array. Therefore, we can calculate the expected number of iterations for this modified version of the problem and assert that our version is easier.

Time complexity: O(infinity)
Expected Time complexity: O(n)
Space complexity: O(1)

Calculation of Expected Time complexity

E(iterations for our problem) <= E(iterations for the above modified problem) = 1*(1/2) + 2*(1/2)^2 + 3*(1/2)^3+... = 2

Because the series converges, the expected number of iterations for the modified problem is constant. Based on an expected-constant number of iterations in which we perform linear work, the expected runtime is linear for the modifed problem. Therefore, the expected runtime for our problem is also linear, as the runtime of the modifed problem serves as an upper bound for it.

Suppose you don't find the majority element in your first try. You can then pick a random element again and check if it is a majority element. You do this until you eventually find the majority element.

Intuitively, lets say we pick randomly 10 numbers. The possibility of NOT being majority number in those 10 numbers is (1/2 * 1/2 * 1/2 ......)= 1/(2^10) = 1/1024 = 0.000976562, which is very low.

Implementation

#include <iostream>
#include <vector>
using namespace std;

int n;
vector<int> v;

// Randomized algorithm with expected runtime: O(n), and Space: O(1)
int majorityElement(vector<int>& nums) {
    int pos = 0, n = nums.size(), freq;
    while(1) {
        freq = 0;
        for(int x : nums)if(x == nums[pos])++freq;
        if(freq > n / 2)
            return nums[pos];
        pos = rand() % n;
    }        
}

int main()
{
    cin >> n;
    for (int i = 0; i < n; i++){
        int tmp;
        cin >> tmp;
        v.push_back(tmp);
    }
    cout << "The majority element is: " << majorityElement(v);
    return 0;
}

With this article at OpenGenus, you must have a good idea of how to find Majority Element using randomized algorithm.