Morris Algorithm for Counting

Morris Algorithm is a Probabilistic Algorithm for counting N elements approximately with significantly less memory. The time complexity of Morris Algorithm is O(N) time and the space complexity is O(loglogN).

The brute force approach of counting N elements takes O(N) time and O(logN) space. Note: Morris Algorithm improves space complexity by a factor of log that is exponential.

Background / Problem statement

To store a number N, we need O(logN) bits.

This can arise in a case when there is a stream of numbers and we need to check the occurence of a paricular element. The occurence can be of order N and hence, we need logN bits. This can be a case when we need to figure out the frequency of each distinct element or the number of distinct elements in a set.

The problem is to do the counting till N but use less than logN bits.

Some points:

  • We can accept a small error in the counting at the cost of less space utilization.
  • The time complexity of counting that is O(N) should not degrade. Hence, it should be maintained.

Basic idea:

If we use logN - 1 bits instead of logN bits, then each element in the current count corresponds to 2 elements in the real count. For example:

i denotes 2i, 2i+1 in the real count.

This happens when for a specific element, we add it to the current count with a probability of 0.5. This means only 50% of the elements are added in the final count and hence, the maximum count we get is N/2 (1 bit saved).

Hence, if we use (logN)-1 bits, then there can be an error of +1 or -1 at the advantage of saving 1 bit. The problem will this approach is that on reducing the number of bits sequentially in this idea, the error becomes significant. For example, if we use (logN)/2 bits, the error in N is nearly 50%. Hence, this approach is not feasible.

Moreover, these approaches have the same space complexity of O(logN). We need an improvement.

The improvement was brought in by Morris Algorithm for Approximate Counting in 1977 by Robert Morris while working at IBM.

Morris Algorithm

The idea of Morris Algorithm is:

Instead of counting N, we will count logN.
As we are counting logN, we only need loglogN bits which is an improvement at the cost of slight accuracy loss of 10% at max.

When we get an element that needs to be counted, we increment the count based on a probability value P. Setting the probability P to a constant value, increases the error significantly. Hence, the probability is different for each element. It is set based on the current count.

P = 1/(2^current_count)

Note the count is always initialized to 1.

Based on this, if the current count is C, then the real count will be:

real count = 2^C - 2

Hence, if C = 10, then real count is 1022.

Following is the pseudocode to add one to the current count according to Morris Algorithm:

// Part of iq.OpenGenus.org
// Returns a random value 1 or 0
Bernoulli(p)
    return 1 with probability p
           or 0 with probability 1-p

// Adds 1 to the current count
AddOne(int count)
{
    float probability = 1 / (2^count);
    // random is 1 or 0
    int random = Bernoulli(probability);
    count = count + random;
    return count;
}

Following is the pseudocode for estimating the real count from the current count:

// Part of iq.OpenGenus.org
Estimate(int count)
{
    int real_count = 2^count - 2;
}

The above functions of Morris Algorithm are used as follows:

// Part of iq.OpenGenus.org
We get a stream of N elements
[a1, a2, ..., aN]
P = Element for which we need to count frequency
// Note count is initialize to 1
int count = 1;

for E in [a1, a2, ..., aN]
    // E is current element
    if E == P
        AddOne(count);
        
int real_count = Estimate(count);

Generalization of Morris Algorithm

As a generalization of Morris Algorithm, we introduce a control variable A which controls:

  • Range of values that can be stored in a 8 bit counter
  • Error in the final result

If the value of A is large, then error will be less but the range of values that can be stored in 8 bit counter will be reduced. In the above case, A is set to 1. In the original implementation by Morris, A is set to 30.

The pseudocode of Generalized Morris Algorithm for Counting is:

// Part of iq.OpenGenus.org
// Returns a random value 1 or 0
Bernoulli(p)
    return 1 with probability p
           or 0 with probability 1-p

// Adds 1 to the current count
AddOne(int count, int A)
{
    float probability = 1 / ((1+1/A)^count);
    // random is 1 or 0
    int random = Bernoulli(probability);
    count = count + random;
    return count;
}

Following is the pseudocode for estimating the real count from the current count based on the control variable A:

// Part of iq.OpenGenus.org
Estimate(int count, int A)
{
    int real_count = A * ((1 + 1/A)^count - 1)
}

Error in Morris Algorithm

As Morris Algorithm is a Probabilistic Algorithm, the answer generated in not the real answer but is very close to the real answer.

First, assume A = 1.

The standard deviation d on the expected value is Square_root(V/2) where V is the approximate count.
Hence, 95% probability is that the real value is -2d to +2d of the expected value.

Hence, if the current count is 200, then the standard deviation d is 10. The expected value is 2^V-2 that is 2^200-2. Therefore, the real value should be within 180 to 220 with 95% probability.

The standard deviation d1 on the actual count N = d1 = square_root(N * (N-1) / 2)
Hence, if the real value is 10,000, then the standard deviation is ~7070. The error is significant in this case.
This is with the case where the control factor A = 1.

Now, consider the case A > 1

When we set A to higher values, the error is reduced. For example, if A = 30, then the standard deviation will be nearly N/8 and hence, 95% of the time, the error will be less than 24%. With A = 3-. values upto 130,000 can be stored in a 8 bit counter.

This algorithm was later improved and other Probabilistic Algorithms for this problem was developed by Philippe Flajolet who has contributed greatly to this field of Probabilistic Counting.

With this article at OpenGenus, you must have the complete idea of Morris Algorithm for Counting.