Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Table of contents:
- Introduction
- Step 1: Choosing a Word Dictionary
- Step 2: Loading the Dictionary Into Memory
- Step 3: Implementing the Spell Checker Application in C++
- Step 4: Testing our Implementation
- Conclusion
Reading time: 25 minutes
Introduction
In this article at OpenGenus, we will cover how to create a spell checker application using C++ programming language. A spell checker is a useful tool utilized to find and provide correction for any misspelled words in a provided text. The goal of our application is to suggest the correct spelling for any misspelled words found and to allow the user to pick the best suggestion.
Step 1: Choosing a Word Dictionary
To implement our spell checker, we need a dictionary text file. This is to provide a list of correctly spelled words that we can use as a reference point in our code. You can either create your own word dictionary, which would take quite a long time, or simply download one from the internet.
Here is an example of a word dictionary that could be used for this project:
github.com/OpenGenus/Spell-Checker-in-cpp/dictionary.txt
Step 2: Loading the Dictionary Into Memory
Now that we've chosen the dictionary we want to use for our spell checker, we must ensure that we can utilize it in our code. We can do this by loading the contents of our dictonary.txt file into memory.
The first thing to note is that checking every single word in a dictionary each time we want to check for the correct spelling of a word would usually take an extremely long time. For reference, the dictionary I am using is almost 50,000 words! An even more extensive dictionary could easily contain triple that amount. So what can we do?
To optimize the performance of our application, we can utilize a data structure known as a prefix tree or trie. Prefix trees are useful for quick and efficient search and retrieval. Their usage will greatly improve the running time of our application.
The following code displays how this can be done in C++:
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
using namespace std;
// Structure of a Trie Node
struct TrieNode {
bool endOfWord;
vector<TrieNode*> children;
TrieNode() {
endOfWord = false;
children = vector<TrieNode*>(26, nullptr);
}
};
// Load Dictionary Into Memory
TrieNode *root = new TrieNode();
ifstream dictFile("dictionary.txt");
string word;
while (getline(dictFile, word)) {
insertWord(root, word);
}
dictFile.close();
In the above implementation, we include the packages necessary and define the structure of our trie node which will hold the contents of our dictionary. Once each word inside the dictionary text file is transferred, we close our dictionary file.
Step 3: Implementing the Spell Checker Application in C++
Once our code can correctly read from our dictionary text file, we can now proceed with our spell checker implementation.
First, let's start with a function for inserting a word into the prefix tree given a pointer to the root node of the tree and a constant reference to the word to be inserted as input parameters:
using namespace std;
void insertWord(TrieNode *root, const string &word) {
TrieNode *currentNode = root;
for (char c : word) { // For each character in the word,
int index = tolower(c) - 'a'; // Convert to lowercase to remove case sensitivity and Calculate where the character will be placed in the prefix tree
if (currentNode->children[index] == nullptr) { // If there's no child node available for the character to be inserted,
currentNode->children[index] = new TrieNode(); // Create a new node and Assign it to the child node at the calculated index
}
currentNode = currentNode->children[index]; // Update the current node pointer to move to next node
}
currentNode->endOfWord = true; // We have reached the end of the word, so set the current node as the end of the word and Set endOfWord to true
}
Next, we must implement a function for searching for a word in our prefix tree, which holds the dictionary we provided earlier. This will allow us to check if a given word exists in the dictionary, confirming its spelling correctness.
using namespace std;
bool searchWord(TrieNode *root, string &word) {
TrieNode *currentNode = root;
for (char c : word) { // For each character in the word,
int index = tolower(c) - 'a'; // Convert to lowercase to remove case sensitivity and Calculate where the character will be in the prefix tree
if (currentNode->children[index] == nullptr) { // If the child node for the character doesn't exist,
return false; // Return false. Word does not exist in the dictionary
}
currentNode = currentNode->children[index]; // Otherwise, keep checking. Move to the next node
}
return currentNode != nullptr && currentNode->endOfWord; // Return true if the current node exists and Marks the end of a word in the dictionary
}
The last function we must implement is the one responsible for providing word suggestions. This allows the user to see potential words that the user intended to type instead of the misspelled word.
There are a couple of methods that can be utilized when computing suggested words. One method involves replacing each letter in a word and checking if that modified word exists in the dictionary. An example where this can be useful is when a user accidentally types a wrong letter while typing a word.
vector<string> suggestions; // Initialize an empty vector of suggestions
// Generate suggestions by replacing each character individually with a letter a - z
for (int i = 0; i < word.length(); ++i) { // For the entire length of the word,
for (char c = 'a'; c <= 'z'; c++) { // For each character in the alphabet,
string modifiedWord = word; // Initilize modified word to the given word
modifiedWord[i] = c; // Replace a given letter at index i with a letter from the alphabet
if (searchWord(root, modifiedWord)) { // If the modified word exists in the dictionary,
suggestions.push_back(modifiedWord); // Add it to our list of suggestions
}
}
}
But, that method might not cover all user errors when typing. There's also the chance that the user missed a letter in a word. For that kind of scenario, we can use the following code:
// Generate suggestions by inserting a character
for (int i = 0; i <= word.length(); ++i) { // For the entire length of the word,
for (char c = 'a'; c <= 'z'; c++) { // For each character in the alphabet,
string modifiedWord = word; // Initilize modified word to the given word
modifiedWord.insert(i, 1, c); // Insert a given letter at index i with a letter from the alphabet
if (searchWord(root, modifiedWord)) { // If the modified word exists in the dictionary,
suggestions.push_back(modifiedWord); // Add it to our list of suggestions
}
}
}
Another useful method for generating suggestions involves deleting a character. This covers the case that a user accidentally types an extra letter in a word.
// Generate suggestions by deleting a character
for (int i = 0; i < word.length(); ++i) { // For the entire length of the word,
string modifiedWord = word; // Initilize modified word to the given word
modifiedWord.erase(i, 1); // Delete a character at a given index i
if (searchWord(root, modifiedWord)) { { // If the modified word exists in the dictionary,
suggestions.push_back(modifiedWord); // Add it to our list of suggestions
}
}
Lastly, there is also the scenario that adjacent letters in a word are accidentally swapped. Here is the method that resolves that issue:
// Generate suggestions by swapping adjacent characters
for (int i = 0; i < word.length() - 1; ++i) { // For the entire length of the word,
string modifiedWord = word; // Initilize modified word to the given word
swap(modifiedWord[i], modifiedWord[i + 1]); // Swap adjacent characters at a given index i and i + 1
if (searchWord(root, modifiedWord)) { // If the modified word exists in the dictionary,
suggestions.push_back(modifiedWord); // Add it to our list of suggestions
}
}
Step 4: Testing our Implementation
To test our spell checker application, we can take a sentence from user input and analyze the output of our code. This includes the words it detects as misspelled and the suggestion of words provided. Here is an example of how we can test our code within int main():
int main() {
// Load the dictionary into memory
TrieNode *root = new TrieNode();
ifstream dictFile("dictionary.txt");
string word;
while (getline(dictFile, word)) {
insertWord(root, word);
}
dictFile.close();
// Take a sentence from user input
string sentence;
cout << "Enter a sentence: ";
getline(cin, sentence);
for (char &c : sentence) {
c = tolower(c);
}
// Separate the sentence into words
stringstream ss(sentence);
bool firstWord = true; // Variable to track the first word in the sentence
vector<string> misspelledWords; // Store misspelled words
while (ss >> word) {
if (!searchWord(root, word)) { // If word is misspelled,
if (!firstWord) {
cout << " ";
}
printRed(word);
misspelledWords.push_back(word); // Add misspelled word to the vector
} else {
if (!firstWord) {
cout << " ";
}
cout << word;
}
firstWord = false;
}
cout << endl;
// Print suggestions for misspelled words
for (const string &misspelledWord : misspelledWords) {
vector<string> suggestions = getSuggestions(root, misspelledWord); // Get suggestions for misspelled word
if (!suggestions.empty()) {
cout << "Suggestions for " << misspelledWord << ": "; // Output the suggestions for the misspelled word
for (const string &suggestion : suggestions) {
cout << suggestion << " ";
}
cout << endl;
}
}
return 0;
}
In my test, I provided the string "you are lved" as user input and received the following:
Enter a sentence: you are lved
you are lved
Suggestions for lved: loved
The code correctly picked up the incorrectly spelled word and suggested the correct one, which was loved in this case.
Let's say that instead we provide the sentence "you are lovd", the suggestion algorithm will now come up with multiple suggestions for the incorrectly spelt word in this case, which is lovd.
Enter a sentence: you are lovd
you are lovd
Suggestions for lovd: love loved
Conclusion
Overall, utilizing a prefix tree holding our dictionary is one of the most time-efficient methods. But, there are many other ways to implement a spell checker algorithm:
- Hashing: A hash table is also a great choice when it comes to holding a dictionary due to its quick retrieval of words, resulting in a very efficient algorithm.
- Levenshtein Distance: Levenshtein distance is an algorithm used to calculate the smallest number of modifications to transform a desired word. An algorithm such as this one could be utilized to calculate potential corrections of misspelled words.
Here is the complete code for a spell checker in C++:
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <algorithm>
using namespace std;
struct TrieNode {
bool endOfWord;
vector<TrieNode*> children;
TrieNode() {
endOfWord = false;
children = vector<TrieNode*>(26, nullptr);
}
};
void printRed(const string& word) {
// Print any misspelled words in red
cout << "\033[31m" << word << "\033[0m";
}
void insertWord(TrieNode *root, const string &word) {
TrieNode *currentNode = root;
for (char c : word) { // For each character in the word,
int index = tolower(c) - 'a'; // Convert to lowercase to remove case sensitivity and Calculate where the character will be placed in the prefix tree
if (currentNode->children[index] == nullptr) { // If there's no child node available for a character to be inserted,
currentNode->children[index] = new TrieNode(); // Create a new node and Assign it to the child node at the calculated index
}
currentNode = currentNode->children[index]; // Update the current node pointer to move to next node
}
currentNode->endOfWord = true; // We have reached the end of the word, so set the current node as the end of the word and Set endOfWord to true
}
bool searchWord(TrieNode *root, string &word) {
TrieNode *currentNode = root;
for (char c : word) { // For each character in the word,
int index = c - 'a'; // Convert to lowercase to remove case sensitivity and Calculate where the character will be in the prefix tree
if (currentNode->children[index] == nullptr) { // If the child node for the character doesn't exist,
return false; // Return false. Word does not exist in the dictionary
}
currentNode = currentNode->children[index]; // Otherwise, keep checking. Move to the next node
}
return currentNode != nullptr && currentNode->endOfWord; // Return true if the current node exists and Marks the end of a word in the dictionary
}
// Get suggestions for a misspelled word
vector<string> getSuggestions(TrieNode* root, const string &word) {
vector<string> suggestions;
// Generate suggestions by changing each character
for (int i = 0; i < word.length(); ++i) {
for (char c = 'a'; c <= 'z'; c++) {
string modified = word;
modified[i] = c;
if (searchWord(root, modified)) {
suggestions.push_back(modified);
}
}
}
// Generate suggestions by inserting each character
for (int i = 0; i <= word.length(); ++i) {
for (char c = 'a'; c <= 'z'; c++) {
string modified = word;
modified.insert(i, 1, c);
if (searchWord(root, modified)) {
suggestions.push_back(modified);
}
}
}
// Generate suggestions by deleting each character
for (int i = 0; i < word.length(); ++i) {
string modified = word;
modified.erase(i, 1);
if (searchWord(root, modified)) {
suggestions.push_back(modified);
}
}
// Generate suggestions by swapping adjacent characters
for (int i = 0; i < word.length() - 1; ++i) {
string modified = word;
swap(modified[i], modified[i + 1]);
if (searchWord(root, modified)) {
suggestions.push_back(modified);
}
}
return suggestions;
}
int main() {
// Load the dictionary into memory
TrieNode *root = new TrieNode();
ifstream dictFile("dictionary.txt");
string word;
while (getline(dictFile, word)) {
insertWord(root, word);
}
dictFile.close();
// Take a sentence from user input
string sentence;
cout << "Enter a sentence: ";
getline(cin, sentence);
for (char &c : sentence) {
c = tolower(c);
}
// Separate the sentence into words
stringstream ss(sentence);
bool firstWord = true; // Variable to track the first word in the sentence
vector<string> misspelledWords; // Store misspelled words
while (ss >> word) {
if (!searchWord(root, word)) { // If word is misspelled,
if (!firstWord) {
cout << " ";
}
printRed(word);
misspelledWords.push_back(word); // Add misspelled word to the vector
} else {
if (!firstWord) {
cout << " ";
}
cout << word;
}
firstWord = false;
}
cout << endl;
// Print suggestions for misspelled words
for (const string &misspelledWord : misspelledWords) {
vector<string> suggestions = getSuggestions(root, misspelledWord); // Get suggestions for misspelled word
if (!suggestions.empty()) {
cout << "Suggestions for " << misspelledWord << ": "; // Output the suggestions for the misspelled word
for (const string &suggestion : suggestions) {
cout << suggestion << " ";
}
cout << endl;
}
}
return 0;
}