Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we have explored the approach of generating / printing all sub-strings of a given String and have provided the implementation to do so. The time complexity to generate all substring is O(N^3) time and there are O(N^2) sub-strings for a string of length N.
Table of content:
- What is a Sub-string?
- Generation of substrings
- Algorithm / steps
- Implementation in C
- Time and Space Complexity
- Uses
Pre-requisites: Arrays, Strings in C
What is a Sub-string?
A substring is a contiguous set of characters in a given string/set of characters. For example, 'string' is a substring of 'substring'. The set of substrings will also cover the complete input string; each-and-every character/letter in the input string too.
Example: Substrings of hello are = {h, e, l, l, o, he, el, ll, lo, hel, ell, llo, hell, ello, hello}
Hence number of substrings for a given string is same as sum of len natural numbers i.e., len (len + 1) / 2 where len is number of characters in the input string.
Generation of substrings
We start with the brute force approach where we print all substrings of length M and then, move on to generate substrings of length M+1.
So, the steps/ approach is:
- String S of length N
- For I from 1 to N
3. Generate all sub-strings of length I
4. For sub-strings of length I, consider P from 0 to N-I
5. One sub-string is "S[P], S[P+1], ..., S[P+I-1]"
We will go through the steps and finetune the approach.
Easiest approach is to print each character from the string, as they're also substrings.
// len refers to length of string or number of characters in the string
// s is the character array or a C-string
for (unsigned int i = 0; i < len; ++i) {
printf("%s\n", s[i]);
}
The next piece of code that can be scribbled is for substrings whose length is 2 characters long. Above code can be modified as below
// The boundary condition ensures
// no single character gets printed in the last iteration
for (unsigned int i = 0; i < len - 1; ++i) {
printf("%c", s[i]);
printf("%c\n", s[i + 1]);
// OR
// printf("%c%c\n", s[i], s[i + 1]);
}
We may continue writing for 3 character length substrings. But we can actually nest another loop inside the existing loop which does the job of many print statements.
for (unsigned int i = 0; i < len - 2; ++i) {
for (unsigned int j = 0; j < 3; ++j)
printf("%c", s[i + j]);
puts("");
}
// OR
for (unsigned int i = 0, k = 2; i < len - k; ++i) {
for (unsigned int j = 0; j < k + 1; ++j)
printf("%c", s[i + j]);
puts("");
}
In the second piece of above code, k
acts as controller on printing number of character a sub-string would contain. We can re-write code piece which would print substrings with 1 character length.
for (unsigned int i = 0, k = 0; i < len - k; ++i) {
for (unsigned int j = 0; j < k + 1; ++j)
printf("%c", s[i + j]);
puts("");
}
Of course, we can have another loop which would vary k
. Lowest value of k
will be 0
(as in above code). Note that, value of k
is one less than number of characters that generating substrings would contain. We know that, string itself would be a substring or in other words, maximum value of k
will be one less than the length of input string.
for (unsigned int k = 0; k < len; ++k) {
for (unsigned int i = 0; i < len - k; ++i) {
for (unsigned int j = 0; j < k + 1; ++j)
printf("%c", s[i + j]);
puts("");
}
}
Combining all the pieces:
Algorithm / steps
The pseudocode steps to generate all sub-strings are:
- Input a string and cache it's length in a variable
- Vary the controller variable
k
from 0 till length - Vary an index
i
from 0 tilllen - k
- Vary another index
j
from 0 tillk + 1
and display the character indexed by the valuei + j
- Display a newline to differentiate between generated substrings and continue with loop in step 3
Implementation in C
Implementation in C Programming Language:
#include <stdio.h>
#include <string.h>
int main()
{
char *s = "Hello";
unsigned int len = strlen(s);
for (unsigned int k = 0; k < len; ++k) {
for (unsigned int i = 0; i < len - k; ++i) {
for (unsigned int j = 0; j < k + 1; ++j)
printf("%c", s[i + j]);
puts("");
}
}
return 0;
}
Output:
H
e
l
l
o
He
el
ll
lo
Hel
ell
llo
Hell
ello
Hello
Following are the sub-strings of the word "OpenGenus":
- O
- p
- e
- n
- G
- e
- n
- u
- s
- Op
- pe
- en
- nG
- Ge
- en
- nu
- us
- Ope
- pen
- enG
- nGe
- Gen
- enu
- nus
- Open
- penG
- enGe
- nGen
- Genu
- enus
- OpenG
- penGe
- enGen
- nGenu
- Genus
- OpenGe
- penGen
- enGenu
- nGenus
- OpenGen
- penGenu
- enGenus
- OpenGenu
- penGenus
- OpenGenus
Time and Space Complexity
Space: Constant space O(1) - loop indexing variables are the only cost
Time: O(N^3)
There are O(N^2) substrings of a string and printing a substring of length M takes O(M) time. Hence, the overall time will be O(N^3).
All the loops directly or indirectly depends on length of input string.
Uses
Substrings are used in pattern-matching.
Many Algorithmic Problems involve dealing with sub-strings. As a brute force approach, all sub-strings can be generated. Note: In most problems, generating sub-strings can be avoided in an efficient approach.
With this article at OpenGenus, you must have the complete idea of generating all sub-strings of a string S.