Generate all sub-strings of a string

Internship at OpenGenus

Get FREE domain for 1st year and build your brand new site

In this article, we have explored the approach of generating / printing all sub-strings of a given String and have provided the implementation to do so. The time complexity to generate all substring is O(N^3) time and there are O(N^2) sub-strings for a string of length N.

Table of content:

  1. What is a Sub-string?
  2. Generation of substrings
  3. Algorithm / steps
  4. Implementation in C
  5. Time and Space Complexity
  6. Uses

Pre-requisites: Arrays, Strings in C

What is a Sub-string?

A substring is a contiguous set of characters in a given string/set of characters. For example, 'string' is a substring of 'substring'. The set of substrings will also cover the complete input string; each-and-every character/letter in the input string too.

Example: Substrings of hello are = {h, e, l, l, o, he, el, ll, lo, hel, ell, llo, hell, ello, hello}

Hence number of substrings for a given string is same as sum of len natural numbers i.e., len (len + 1) / 2 where len is number of characters in the input string.

Generation of substrings

We start with the brute force approach where we print all substrings of length M and then, move on to generate substrings of length M+1.

So, the steps/ approach is:

  1. String S of length N
  2. For I from 1 to N
    3. Generate all sub-strings of length I
    4. For sub-strings of length I, consider P from 0 to N-I
    5. One sub-string is "S[P], S[P+1], ..., S[P+I-1]"

We will go through the steps and finetune the approach.

Easiest approach is to print each character from the string, as they're also substrings.

// len refers to length of string or number of characters in the string
// s is the character array or a C-string
for (unsigned int i = 0; i < len; ++i) {
    printf("%s\n", s[i]);
}

The next piece of code that can be scribbled is for substrings whose length is 2 characters long. Above code can be modified as below

// The boundary condition ensures
// no single character gets printed in the last iteration
for (unsigned int i = 0; i < len - 1; ++i) {
    printf("%c", s[i]);
    printf("%c\n", s[i + 1]);
    // OR
    // printf("%c%c\n", s[i], s[i + 1]);
}

We may continue writing for 3 character length substrings. But we can actually nest another loop inside the existing loop which does the job of many print statements.

for (unsigned int i = 0; i < len - 2; ++i) {
    for (unsigned int j = 0; j < 3; ++j)
        printf("%c", s[i + j]);
    puts("");
}
// OR
for (unsigned int i = 0, k = 2; i < len - k; ++i) {
    for (unsigned int j = 0; j < k + 1; ++j)
        printf("%c", s[i + j]);
    puts("");
}

In the second piece of above code, k acts as controller on printing number of character a sub-string would contain. We can re-write code piece which would print substrings with 1 character length.

for (unsigned int i = 0, k = 0; i < len - k; ++i) {
    for (unsigned int j = 0; j < k + 1; ++j)
        printf("%c", s[i + j]);
    puts("");
}

Of course, we can have another loop which would vary k. Lowest value of k will be 0 (as in above code). Note that, value of k is one less than number of characters that generating substrings would contain. We know that, string itself would be a substring or in other words, maximum value of k will be one less than the length of input string.

for (unsigned int k = 0; k < len; ++k) {
    for (unsigned int i = 0; i < len - k; ++i) {
        for (unsigned int j = 0; j < k + 1; ++j)
            printf("%c", s[i + j]);
        puts("");
    }
}

Combining all the pieces:

Algorithm / steps

The pseudocode steps to generate all sub-strings are:

  1. Input a string and cache it's length in a variable
  2. Vary the controller variable k from 0 till length
  3. Vary an index i from 0 till len - k
  4. Vary another index j from 0 till k + 1 and display the character indexed by the value i + j
  5. Display a newline to differentiate between generated substrings and continue with loop in step 3

Implementation in C

Implementation in C Programming Language:

#include <stdio.h>
#include <string.h>

int main()
{
    char *s = "Hello";
    unsigned int len = strlen(s);

    for (unsigned int k = 0; k < len; ++k) {
        for (unsigned int i = 0; i < len - k; ++i) {
            for (unsigned int j = 0; j < k + 1; ++j)
                printf("%c", s[i + j]);
            puts("");
        }
    }

    return 0;
}

Output:

H
e
l
l
o
He
el
ll
lo
Hel
ell
llo
Hell
ello
Hello

Following are the sub-strings of the word "OpenGenus":

  • O
  • p
  • e
  • n
  • G
  • e
  • n
  • u
  • s
  • Op
  • pe
  • en
  • nG
  • Ge
  • en
  • nu
  • us
  • Ope
  • pen
  • enG
  • nGe
  • Gen
  • enu
  • nus
  • Open
  • penG
  • enGe
  • nGen
  • Genu
  • enus
  • OpenG
  • penGe
  • enGen
  • nGenu
  • Genus
  • OpenGe
  • penGen
  • enGenu
  • nGenus
  • OpenGen
  • penGenu
  • enGenus
  • OpenGenu
  • penGenus
  • OpenGenus

Time and Space Complexity

Space: Constant space O(1) - loop indexing variables are the only cost

Time: O(N^3)

There are O(N^2) substrings of a string and printing a substring of length M takes O(M) time. Hence, the overall time will be O(N^3).

All the loops directly or indirectly depends on length of input string.

Uses

Substrings are used in pattern-matching.

Many Algorithmic Problems involve dealing with sub-strings. As a brute force approach, all sub-strings can be generated. Note: In most problems, generating sub-strings can be avoided in an efficient approach.

With this article at OpenGenus, you must have the complete idea of generating all sub-strings of a string S.