Number of sub-strings with each character occurring even times

Internship at OpenGenus

Get FREE domain for 1st year and build your brand new site

This article discusses different algorithmic approach which we can use to find number of sub-strings with each character in it occurring even number times for a given string.

Table of Contents

  1. Pre-requisites
  2. Naive method
  3. Question
  4. Efficient approach
  5. Complexity

Pre-requisites

Arrays, Strings in C, Generate all sub-strings of a string, Bitwise operations, Dictionary or key-value representation abstract type


Naive method

Sub-string generation is known to us. Using the same method we can count repeating characters in the string. If any of the count is odd we ignore it, but if all the counts are even for the generated sub-string then another count even_count is incremented. Reset all the counts except even_count while generating next sub-string. Corresponding algorithm is described below.

Naive method algorithm

  1. Reset count vector or set count vector to zeros
  2. Generate sub-string for the input string
  3. Scroll through the generated sub-string and increment corresponding value in count vector
  4. Scroll the count vector and goto step 1 if any of the count is odd and do not increment even_count. If all the counts in the count vector are even then, increment even_count, goto step 1

Naive method C code

#include <stdio.h>
#include <string.h>

int main(int arg_count, char *arg_vec[])
{
    // if a CLA is passed, consider it
    // else consider predefined string
    char *s = (arg_count > 1) ? arg_vec[1] : "OpenGenus";
    unsigned int len = strlen(s), even_count = 0;

    for (unsigned int k = 0; k < len; ++k) {
        for (unsigned int i = 0; i < len - k; ++i) {
            // count_vec is 256 elements long
            // and it must be initialized with zeros
            // an extra index variable for count_vec
            // 'l' is also used
            // 256 => number of ASCII characters
            unsigned int count_vec[256] = {}, l;
            for (unsigned int j = 0; j < k + 1; ++j) {
                // increment count element corresponding
                // to the 'i+j'th char; along with 
                // displaying
                count_vec[s[i + j]]++;
                putchar(s[i + j]);
            }

            // the below block of code examines
            // for any odd count break the loop
            for (l = 0; l < 256; ++l) {
                if (count_vec[l] & 0x01)
                    break;
            }
            // if the count_vec's indexing variable
            // reached the end then all the elements are
            // even (else the loop would've broken)
            even_count += (l == 256);
            putchar('\n');
        }
    }

    printf("%d\n", even_count);
    return 0;
}



Question

What is the time complexity of above code

O(n)
O(n^3)
O(n^2)
O(nlogn)
That's right

Few optimizations in the naive code:

Optimization 1:

The above code shall work fine for any string which may contain a character repeated for INT_MAX times. We may change count_vec part of the code as below,

from

    count_vec[s[i + j]]++;

to

    count_vec[s[i + j]] ^= 1;

And,

    if (count_vec[l] & 0x01)

to

    if (count_vec[l])

Optimization 2:

If the string input contains only alphanumeric characters then, we can reduce the count_vec to 'z' ( or 122 ) elements. OR even eliminate the game of vector/array at all. Instead we may use a variable which can accomodate minimum 62 bits. Code corresponding to this optimization is here with comments.


#include <stdio.h>
#include <string.h>

int main(int arg_count, char *arg_vec[])
{
    char *s = (arg_count > 1) ? arg_vec[1] : "OpenGenus";
    unsigned int len, even_count, i, j, k;
    unsigned long long count_vec;

    len = strlen(s);
    even_count = 0;
    for (k = 0; k < len; ++k) {
        for (i = 0; i < len - k; ++i) {

            count_vec = 0;
            // count_vec is 64 bits long
            // first 26 bits represent cap alphas
            // i.e., 0-25 bits
            // next 10 bits represent digits
            // i.e., 26-35 bits
            // next 26 bits for lower alphas
            // i.e., 36-61 bits
            for (j = 0; j < k + 1; ++j) {
                // XOR gate toggles corresponding bits
                // of the mask where the bits are set
                // First time when a character occurs,
                // it sets a bit allocated for it
                // if the same character is encountered again,
                // the same bit is toggled or i.e., cleared
                // so, even repeatations will clear the bit
                // and odd repeatations will set the bit
                if (s[i + j] >= 'A' && s[i + j] <= 'Z')
                    count_vec ^= 1 << (s[i + j] - 'A');
                else if (s[i + j] >= '0' && s[i + j] <= '9')
                    count_vec ^= 1 << (s[i + j] - '0' + 26);
                else if (s[i + j] >= 'a' && s[i + j] <= 'z')
                    count_vec ^= 1 << (s[i + j] - 'z' + 36);

                putchar(s[i + j]);
            }
            // if count_vec contains zero set bits
            // then all the characters were repeated
            // even number times in this sub-string
            even_count += (count_vec == 0);
            putchar('\n');
        }
    }

    printf("%d\n", even_count);
    return 0;
}

</div>


Efficient approach


Efficient approach algorithm

  1. Initialize a dictionary in which both key, value are numbers to zeros. And set value of 0'th key to 1
  2. Also, initialize a variable which can hold bit masks and another variable which can hold count of even number of characters in sub-strings of a string to zeros
  3. Scroll through the input string character by character, for every character flip the bit-mask corresponding to the character get value for the new mask being key in the dictionary. Add it to count variable
  4. Increment value of the key in dictionary with key being latest mask. Goto step 3.

Efficient approach C code


// The code works fine for string with low-alphas only
#include <stdio.h>
#include <string.h>

// The function looks up s_key in the key-value pair
// it returns the corresponding value
// also increments the value
// both key and value are static integer arrays
// so that, data is not erased across function calls
unsigned int dict(unsigned int s_key) {
    static unsigned int key[1000], value[1000] = {[0] = 1};
    unsigned int i;

    for (i = 0; i < 1000; ++i) {
        if (key[i] == 0 && value[i] == 0) break;
        if (key[i] == s_key) break;
    }
    if (key[i] == 0 && value[i] == 0) key[i] = s_key;
    return value[i]++;
}

int main(int arg_count, char *arg_vec[]) {
    const char *s = (arg_count > 1) ? (arg_vec[1]) : ("hello");
    unsigned int n, i, pre = 0, count = 0;

    n = strlen(s);
    for (i = 0; i < n; i++) {
        pre ^= 1 << (s[i] - 97);
        count += dict(pre);
    }

    printf("%d\n", count);
    return 0;
}

Readers are encouraged to modify the code such that the executable supports alphanumeric character set.


Complexity

  • Time complexity of the algorithm is O(n) while the shown code has a time complexity of O(n2). It can be O(n) if a better dictonary is used. The code tries to mimic a dictonary.
  • Space complexity of the algorithm is O(n) while shown code consumes constant amount of space. If hashing and dictonaries are used then space complexity would be O(n)=

With this article at OpenGenus, you must have the complete idea to find the Number of sub-strings with each character occurring even times efficiently.