×

Search anything:

Spell Checker Console Application in Java

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article, we will develop a spell checker application which checks for spelling mistakes in our input and also suggest possible corrections. We will implement this tool in Java Programming Language.

Introduction


  • Spell checking is an essential tool for anyone who writes frequently, whether it be for work or personal use.
  • A spell checker can quickly identify and correct spelling errors in your writing, saving you time and avoiding embarrassment.
  • In this article, we will be discussing a spell checker application in Java. The application takes a list of words as its dictionary and checks the spelling of words in an input string.
  • It also provides suggestions for misspelled words, making it a powerful tool for writers. Whether you're a student, professional, or just someone who likes to write, this spell checker will help improve the quality of your writing.

So, let's dive in and learn how to build a spell checker application in Java!

Features


The features that we plan to implement in our spell checker application are as follows:

  • As a spell checker program, It needs a standard dictionary to check spelling mistakes.
  • It should be able list out all misspelled words.
  • And also, it should suggest possible corrections.
  • It should all prompt that if there is no any mistakes.

Implementation


Approach # 1 : Using Set Data Structure


  • The code will be a Java program for a spell checker.
  • The SpellChecker class is used to check the spelling of words in an input string. The class has a constructor that takes in a set of words (dictionary) and sets it as an instance variable dictionary.
  • The method check takes in a string (input) and splits it into separate words using the split method.
  • It then checks if each word is present in the dictionary using the contains method of the Set class.
  • If a word is not present in the dictionary, it is added to the list of misspelled words. After all words have been checked, the misspelled words are displayed and for each misspelled word, a possible correction is suggested using the suggest method.
  • The method levenshteinDistance calculates the Levenshtein distance between two strings which represents the minimum number of edits (insertions, deletions, or substitutions) required to transform one string into the other.
  • This method is used to find the word in the dictionary that is closest to the misspelled word.
  • The method suggest takes in a misspelled word and calculates the Levenshtein distance between the misspelled word and each word in the dictionary.
  • It then selects the word with the smallest Levenshtein distance as the suggested
  • correction.
  • The filetoset method takes in a file and returns a set of words that have been read from the file.
  • This method is used to read the dictionary from a file and return it as a set.

Finally, the main method sets up the SpellChecker by first reading the dictionary from a file using the filetoset method and then creating an instance of the SpellChecker class. It then prompts the user for input and continues to check the spelling of the input words until the user enters q.

main() method

First, we will look into main() method. It is as follows:

import java.io.*;
import java.util.*;

// Main class that implements spell checking functionality
public class SpellChecker {
    
    // Set to store the words in the dictionary
    private Set<String> dictionary;
    
    // Constructor to initialize the dictionary set
    public SpellChecker(Set<String> dictionary) {
        this.dictionary = dictionary;
    }
    
    // ANSI_RESET code resets the text to its default color.
    public static final String ANSI_RESET = "\u001B[0m";
    // ANSI escape codes to set the color of the text being printed to the console
    public static final String ANSI_RED = "\u001B[31m";
    public static final String ANSI_BLUE = "\u001B[34m";
    public static final String ANSI_YELLOW = "\u001B[33m";

    // Method to check the spelling of the given input
    public  void check(String input) {
        // code will be written
    }
    
    // displays the final checked sentence and the possible corrections for any misspelled words
    public static void PrintChecked(String[] words, List<String> misspelled) {
        // code will be written
        }
    }
    
    // Method to find the Levenshtein distance between two words
    private int levenshteinDistance(String word1, String word2) {
        // code will be wrtten
    }

    // Method to suggest a correction for a misspelled word
    private String suggest(String word) {
        // code will be written
    }

    public static Set<String> filetoset(File file) throws FileNotFoundException {
       // code will be written
    }
    /*
    The Fucntion of Main() method is as follows:
    1. It will load a txt file into set data structure.
    2. create a constructor for spellchecker class.
    3. accept input as string.
    4. prints output according to the input.
    */ 
    public static void main(String[] args) throws FileNotFoundException {
        File file = new File("words.utf-8.txt");
        Set<String> SET = filetoset(file);
        SpellChecker spellChecker = new SpellChecker(SET);
        System.out.print("Enter the string to be checked(Characters Only): ");
        String line;
        Scanner s = new Scanner(System.in);
        while(( line = s.nextLine()) != null){
            if(line.equals("q")) break;
            spellchecker.check(line);
            System.out.println("Enter some more words: or ('q' for exit)");
        }
        s.close();
    }   
}
  • The main() method is the entry point of the program. It prompts the user to enter words to check for mistakes.
  • The used words.utf-8.txt file is available here

filetoset() method


public static Set<String> filetoset(File file) throws FileNotFoundException {
        // Create a set to store the words from the file
        Set<String> SET = new HashSet<>();
        // Create a scanner to read the file
        Scanner s = new Scanner(file);
        // Loop through each line of the file
        while (s.hasNextLine()) {
            // Read a line from the file
            String line = s.nextLine();
            // Add the line (word) to the set
            SET.add(line);
        }
        // Close the scanner
        s.close();
        // Print a message indicating that reading the dictionary is done
        System.out.println("Done reading dictionary");
        // Return the set of words
        return SET;
    }
  • The filetoset() method is accepts a File type data and add every line into a set data structure.
  • We can acheive that by using File and Scanner classes.
  • It is difficult to add all words in program by writing code. like this,
    private static final String[] dictionary = {"hello", "world", "java", "programming"};
    
  • It returns a set containing all the distinct words.

check() method


// Method to check the spelling of the given input
public  void check(String input) {
        // List to store misspelled words
        List<String> misspelled = new ArrayList<>();

        // Splitting the input string into words
        String[] words = input.split("\\s+");

        // Checking the spelling of each word
        for (String word : words) {
            // If word not found in dictionary, add to misspelled words list
            if (!dictionary.contains(word.toLowerCase())) {
                misspelled.add(word);
            }
        }
        
        System.out.print("The Sentence after checking: ");
        PrintChecked(words,misspelled);
 }

The check() method is used to check the spelling of words in an input string and to list out the misspelled words. The method takes a input string as an input, which is the string that needs to be checked for spelling.

  • It first initializes a list misspelled to store the misspelled words.
    The method then splits the input string into individual words using the split() method and a regular expression that matches one or more whitespace characters (\s+).
  • For each word in the list of words obtained from the input string, the method checks if the word is not present in the dictionary by converting the word to lower case and checking if it's not in the dictionary set.
  • If the word is not in the dictionary, it is added to the misspelled list.
    After checking all the words, the method checks if the misspelled list is empty.
    1. If the list is empty, the method prints a message indicating that there are no mistakes.
    2. If the list is not empty, the method prints the list of misspelled words and for each misspelled word, it calls the suggest() method to suggest a correction.

suggest() method


private String suggest(String word) {
        // initializes a variable "minDistance" to the maximum possible value of an integer
        int minDistance = Integer.MAX_VALUE;
        // initializes a string variable "suggestion" to an empty string.
        String suggestion = "";
        
        // loops through each word in the dictionary ("dictWord")
        for (String dictWord : dictionary) {
            /* calculates the Levenshtein distance between the misspelled word 
            and the current word*/ 
            int distance = levenshteinDistance(word, dictWord);
            
            /* If the calculated distance is less than the current minDistance, 
            the method updates the minDistance with the new distance and updates 
            the suggestion with the current dictWord.*/
            if (distance < minDistance) {
                minDistance = distance;
                suggestion = dictWord;
            }
        }
        // prints the suggestion as a correction for the misspelled word.
        return suggestion;
    }
}
  • The suggest() method is used to suggest a correction for a misspelled word.
  • It does this by finding the word in the dictionary with the closest Levenshtein distance to the misspelled word.
  • The method starts by initializing two variables, "minDistance" with a large value (Integer.MAX_VALUE), and "suggestion" as an empty string.
  • It then loops through each word in the dictionary and calculates the Levenshtein distance between the misspelled word and the current dictionary word.
  • If the distance is smaller than the current minimum distance, the current dictionary word becomes the new suggestion and the new minimum distance is updated.

Finally, the method prints the suggested correction for the misspelled word. The "levenshteinDistance" method is used to calculate the distance between the words.

levenshteinDistance() method


// Method to find the Levenshtein distance between two words
private int levenshteinDistance(String word1, String word2) {
        int m = word1.length();
        int n = word2.length();
        
        // 2D array to store the distance between substrings
        int[][] dp = new int[m + 1][n + 1];
        
        // Filling the 2D array to find the minimum distance
        for (int i = 0; i <= m; i++) {
            for (int j = 0; j <= n; j++) {
                if (i == 0) {
                    dp[i][j] = j;
                } else if (j == 0) {
                    dp[i][j] = i;
                } else if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1];
                } else {
                    dp[i][j] = 1 + Math.min(Math.min(dp[i - 1][j], dp[i][j - 1]), dp[i - 1][j - 1]);
                }
            }
        }
        // Returning the minimum distance
        return dp[m][n];
    }
  • The levenshteinDistance method calculates the Levenshtein distance between two words.

Levenshtein distance is a measure of the difference between two strings, defined as the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into the other.

  • The method takes in two strings, "word1" and "word2", and returns the Levenshtein distance between them.
  • It uses a dynamic programming approach to find the minimum number of operations required to transform one string into the other.
  • The method starts by initializing a two-dimensional array "dp" with dimensions (m+1) x (n+1), where m is the length of word1 and n is the length of word2.
  • It then loops through both arrays, using the elements of word1 and word2 to fill in the "dp" array. If the characters at the current positions in word1 and word2 match, the value in the "dp" array is equal to the value in the previous position.
  • If the characters do not match, the value in the "dp" array is the minimum of the three possible operations (insertion, deletion, substitution) plus one.

Finally, the method returns the value in the bottom-right corner of the "dp" array, which is the Levenshtein distance between the two words.

PrintChecked() method


public static void PrintChecked(String[] words, List<String> misspelled) {
       // print the sentence with misspelled words in red and correctly spelled words in blue
        for(String word : words) {
            if(misspelled.contains(word)) System.out.print(ANSI_RED+word+" "+ANSI_RESET);
            else System.out.print(ANSI_BLUE+word+" "+ANSI_RESET);
        }
        
        // print the header for the corrections section
        System.out.println();
        if(misspelled.isEmpty()) System.out.println("No mistakes, you're good.");
        else{
            System.out.println(ANSI_YELLOW+"**Possible Corrections for misspelled words**"+ANSI_RESET);

        // iterate over each misspelled word and print the correction in blue
        for (String missword : misspelled) {
            System.out.println(ANSI_RED+missword+ANSI_RESET+" --> "+ANSI_BLUE+suggest(missword)+ANSI_RESET);
        }
        }
    }
  • The PrintChecked method is used to display the final checked sentence and the possible corrections for any misspelled words in the sentence.
  • The method takes two parameters - words, an array of strings representing each word in the sentence, and misspelled, a list of strings representing the misspelled words in the sentence.
  • The method starts by printing the checked sentence by iterating over the words array and printing each word in blue color if it is spelled correctly and in red color if it is misspelled.
  • After printing the checked sentence, the method prints the possible corrections for each misspelled word.
  • The correction is obtained by calling the suggest method and passing the misspelled word as an argument.
  • The method then prints the correction for each misspelled word in blue color, preceded by the misspelled word in red color.

Look at the example output:

output4

Approach # 2 : Using Trie Data Structure


We can use a trie data structure for a dictionary to insert, search for words and we can get closest suggested words. To acheive that process is as follows:

  • we will create a Trie data structure to store the words of a dictionary and a SpellChecker class that takes a Trie object as input and checks the spelling of a given string.
  • If a word from the string is not present in the Trie, the word is considered misspelled and the closest matching word from the Trie is suggested as a correction.
  • The Trie is populated with the words from a dictionary file using the filetotrie() method.
  • In the main method, a user input string is taken and passed to the check() method of the SpellChecker class, which prints the words in the string with correct words in blue and misspelled words in red and also prints the suggested correction for each misspelled word.

This application can be implemented by writing three different classes. They are:
1. Main Class
2. Spell Checker Class
3. Trie and TrieNode Classes

Main Class


import java.io.*;
import java.util.*;

// This is the Main class of the SpellCheckerUsingTrie program
public class Main {

    // A helper method to read the words from a file and insert them into a Trie
    public static Trie filetotrie(File file) throws FileNotFoundException {
        // Create a new Trie instance
        Trie trie = new Trie();
        // Create a scanner to read the file
        Scanner s = new Scanner(file);
        // Read the file line by line
        while (s.hasNextLine()) {
            String line = s.nextLine();
            // Insert each line (word) into the Trie
            trie.insert(line);
        }
        // Close the scanner
        s.close();
        // Print a message indicating that the reading of the file is done
        System.out.println("Done reading dictionary");
        // Return the Trie
        return trie;
    }
    // The main method of the program
    public static void main(String[] args) throws FileNotFoundException {
        // Create a File instance pointing to the file "words.utf-8.txt"
        File file = new File("words.utf-8.txt");
        // Call the helper method to read the words from the file and insert them into a Trie
        Trie trie = filetotrie(file);
        // Create a SpellChecker instance using the Trie
        SpellChecker spellChecker = new SpellChecker(trie);
        // Print a message to prompt the user to enter a string to be checked
        System.out.print("Enter the string to be checked (Characters only): ");
        // Create a scanner to read the user's input
        Scanner s = new Scanner(System.in);
        // Read the user's input
        String line;
        // Keep looping until the user enters 'q'
        while(( line = s.nextLine()) != null){
            // If the user enters 'q', break the loop
            if(line.equals("q")) break;
            // Call the check method of the SpellChecker to check the entered string
            spellChecker.check(line);
            // Print the prompt message again
            System.out.print("Enter another string: or ('q' for exit): ");
        }
        // Close the scanner
        s.close();
    }
}

It is the executing class of the program. It has two methods:

  1. filetotrie: A helper method that reads the words from a text file and inserts them into a trie.
  2. main: The main method of the program that takes user input and checks if the entered string is a valid word or not.
  • The program starts by reading the words from a text file "words.utf-8.txt" and inserting them into a trie using the filetotrie method.
  • Then, it creates an instance of the SpellChecker class using the trie and prompts the user to enter a string to be checked.
  • The entered string is then passed to the check method of the SpellChecker class, which checks if the entered string is a valid word or not.
  • The program continues to take user input and check words until the user enters 'q'.

SpellChecker Class


//SpellChecker class implements the spell checking functionality. 
public class SpellChecker {
    //A static Trie object that holds the dictionary for spell checking
    private static Trie dictionary;
    
    //Constructor that takes Trie object as an argument and sets it as the dictionary for spell checking
    public SpellChecker(Trie trie) {
        SpellChecker.dictionary = trie;
    }
    
    //ANSI escape codes for setting colors while printing words
    public static final String ANSI_RESET = "\u001B[0m";
    public static final String ANSI_RED = "\u001B[31m";
    public static final String ANSI_BLUE = "\u001B[34m";
    public static final String ANSI_YELLOW = "\u001B[33m";

    //Method to check the spelling of words in the input string
    public  void check(String input) {
        //Splitting the input string into words
        String[] words = input.split("\\s+");
        //Printing the spell checked words
        printCheckedTrie(words);
    }

    //Method to print the spell checked words along with the closest correct word
    public static void printCheckedTrie(String[] words) {
        int m=0;
        //Iterating through each word in the input
        for (String word : words) {
            //Checking if the word exists in the dictionary
            if (!dictionary.search(word)) {
                //If the word doesn't exist, print the word in red
                System.out.print(ANSI_RED + word + " " + ANSI_RESET);
                m++;
            } else {
                //If the word exists, print the word in blue
                System.out.print(ANSI_BLUE + word + " " + ANSI_RESET);
            }
        }
        //Printing a new line
        System.out.println();
        if (m>0) System.out.println("No mistakes, you're good.");
        else{
            //Printing the header for the possible corrections for misspelled words
            System.out.println(ANSI_YELLOW + "**Possible Corrections for misspelled words**" + ANSI_RESET);
            //Iterating through each word in the input
            for (String word : words) {
                //Checking if the word exists in the dictionary
                if (!dictionary.search(word)) {
                    //If the word doesn't exist, print the misspelled word in red along with the closest correct word in blue
                    System.out.println(ANSI_RED + word + ANSI_RESET + " --> " + ANSI_BLUE + dictionary.getClosest(word) + ANSI_RESET);
                }
            }
        }
    }
}
  • The SpellChecker class has a private static Trie object, dictionary, that holds the dictionary for spell checking.
  • The class also has ANSI escape codes for setting colors when printing words.
  • The check method takes an input string and splits it into words, then calls the printCheckedTrie method to print the spell checked words along with the closest correct word.
  • The printCheckedTrie method iterates through each word in the input, checks if it exists in the dictionary, and prints it in blue if it does or in red if it doesn't.
  • If there are any misspelled words, the method also prints the misspelled words in red and the closest correct word in blue.

Trie Class


import java.util.HashMap;
import java.util.Map;

// Trie class implementation
public class Trie {
    // TrieNode class definition
    private class TrieNode {
    // Map to store the children nodes
    Map<Character, TrieNode> children;
    // Flag to indicate end of a word
    boolean endOfWord;
    
    
        
        // Constructor to initialize the TrieNode
    public TrieNode() {
        // Initialize the children map
        children = new HashMap<>();
        // Initialize endOfWord flag to false
        endOfWord = false;
    }
}

    // Root node of the Trie
    private final TrieNode root;

    // Constructor to initialize the Trie
    public Trie() {
        // Initialize the root node
        root = new TrieNode();
    }

    // Method to insert a word into the Trie
    public void insert(String word) {
        // Start from the root node
        TrieNode current = root;
        // Iterate through each character of the word
        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            // Get the node for the current character
            TrieNode node = current.children.get(ch);
            // If the node doesn't exist, create a new one
            if (node == null) {
                node = new TrieNode();
                current.children.put(ch, node);
            }
            // Move to the next node
            current = node;
        }
        // Mark the end of the word
        current.endOfWord = true;
    }

    // Method to search a word in the Trie
    public boolean search(String word) {
        // Start from the root node
        TrieNode current = root;
        // Iterate through each character of the word
        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            // Get the node for the current character
            TrieNode node = current.children.get(ch);
            // If the node doesn't exist, return false
            if (node == null) {
                return false;
            }
            // Move to the next node
            current = node;
        }
        // Return the endOfWord flag
        return current.endOfWord;
    }

    // Method to get the closest word in the Trie
    public String getClosest(String word) {
        // Start from the root node
        TrieNode current = root;
        // Initialize the closest word
        String closest = "";
        // Iterate through each character of the word
        for (int i = 0; i < word.length(); i++) {
            char ch = word.charAt(i);
            // Get the node for the current character
            TrieNode node = current.children.get(ch);
            // If the node doesn't exist, break the loop
            if (node == null) {
                break;
            }
            // Move to the next node
            current = node;
            // Add the current character to the closest word
            closest += ch;
        }
        // If the current node is the end of a word, return it
        if (current.endOfWord) {
                return closest;
            }
        // If not, search the Trie for a closer match
        return dfs(current, closest);
    }

    private String dfs(TrieNode node, String str) {
        // If the current node marks the end of a word, return the current match
        if (node.endOfWord) {
            return str;
        }
        // Initialize the closest match
        String closest = "";
        // Search the Trie for a closer match
        for (Map.Entry<Character, TrieNode> entry : node.children.entrySet()) {
            String word = dfs(entry.getValue(), str + entry.getKey());
            // Update the closest match if a closer match is found
            if (closest.length() == 0 || word.length() < closest.length()) {
                closest = word;
            }
        }
        // Return the closest match
        return closest;
    }
}
  • This is an implementation of a Trie data structure in Java.
  • A Trie, also known as a prefix tree, is a tree-like data structure used to store an associative array where the keys are sequences of elements, typically characters.
  • This implementation has methods to insert a word into the Trie, search a word in the Trie, and get the closest word in the Trie.
  • The closest word method returns the closest word stored in the Trie that starts with the given prefix.
  • The Trie class has a private inner class TrieNode, which represents a node in the Trie.
  • Each node has a map to store its children nodes, and a flag to indicate whether it is the end of a word. The root node of the Trie is initialized in the Trie constructor.

Look at the example output:
outputtrie

Further improvements


  1. Word normalization: The program currently only checks for lowercase words in the dictionary, it could be improved by normalizing the input words and dictionary words to a common case.

  2. User-defined dictionary: The program uses a fixed dictionary file, it could be improved by allowing the user to add or remove words from the dictionary.

  3. Performance optimization: The program can be slow for a large dictionary and long inputs, optimization could be done in the Levenshtein distance calculation by using dynamic programming and memoization.

  4. Context-based suggestions: The suggestions are based only on the Levenshtein distance between the misspelled word and dictionary words, it could be improved by considering the context of the word, such as its position in the sentence, to make more accurate suggestions.

  5. Additional correction methods: Currently, the program only suggests a correction based on the Levenshtein distance, it could be improved by incorporating additional correction methods such as phonetic matching and n-gram matching.

  6. User interface: The program has a simple command-line interface, it could be improved by adding a graphical user interface to make it more user-friendly.

Conclusion


  • In conclusion, the above code is an implementation of a spell checker program in Java using the Levenshtein distance algorithm.
  • The program takes a dictionary of words and a string input from the user. It then splits the input into words and checks if each word is in the dictionary.
  • If a word is not found in the dictionary, it is considered misspelled and added to a list of misspelled words.
  • The program then suggests possible corrections for each misspelled word by finding the word in the dictionary with the smallest Levenshtein distance.
  • The program is efficient and can be easily improved by implementing more advanced algorithms or adding additional features such as personalization of the dictionary based on the user's frequently used words.
  • This implementation can be a great starting point for developing a more robust and feature-rich spell checker program.
Spell Checker Console Application in Java
Share this