String handling and processing are one of the most important topics for programmers. Many real time applications are based on the string processing like:
1. Search Engine results optimization
2. Data Analytics
3. Sentimental Analysis
The data structure that is very important for string handling is the Trie data structure that is based on prefix of string
An introduction to Trie and basic implementation can be read on this link.
Here, we talk about the various application of Trie Data Structures.
Table of Contents
1. Auto Complete
2. Spell Checkers
3. Longest Prefix Matching
4. Browser History
Let's cover each of the above topics one by one.
1. Auto Complete
Autocomplete is the most important application of Trie Data structure. Its applications are immense and its usage trend is exponential in current age of smartphones and smart devices. It is an mechanism in which the device software predicts the rest of a word a user is typing based on the string prefix. In Graphical User Interfaces, users can select one of the various suggestions with tab spaces, arrow keys etc.
Auto complete feature speeds up interactions between a user and the application and improves user experience,especially when it accurately predicts the word a user intends to use after the first few characters have been typed. Auto complete feature is more accurate when there are limited number of possible words or when some words are more commonly used.
One of the most important field of research currently is to create optimum artificial intelligence algorithms that efficiently predicts the words based on the user experience and usage.
Need of Trie for Auto-complete
- Trie provides for alphabetical ordering of the data by keys.
- Trie data structure is fastest for auto-complete suggestions. Even in the worst case, it is
O(n)times faster than alternate imperfect hash table algorithm where n is the string length. Also, Trie data structure overcomes the problem of key collisions in imperfect hash tables that cause reduce in accuracy.
- Searching for suggestions using Trie enables us to trace pointers to get to a node that represent the string user has entered. On traversing a trie data structure using the Depth first search algorithm, one can enlist all suggestable strings that complete the user input.
Various software applications of auto complete
1. Web Browser: Autocomplete is implemented in both the search box of the search engine which uses browser history for suggestions as well as the address bar where typing the full address of website takes time and remembering so many website addresses accurately is tiresome.
2. Email: Email uses autocomplete feature primarily for filling out email address of recipients based on mail history.
Many email service providers like Gmail now also provide auto complete on content strings using artificial intelligence algorithms.
3. Search Engines: Autocomplete providers user with suggestions based on most relevant details and also based on the user history. They follow the auto suggest/incremental search algorithm.
More advanced algorithms that are used include language independent Levenshtein Algorithm.
4. Source Code Editors: Autocomplete feature for source code editors is called Code Completion. Autocomplete for source code is very accurate and efficient due to specific syntax of the programming languages.
The great accuracy comes in the fixed number of variables and keywords defined in a special file called namespace.
5. Database Query Tools: Auto complete feature in database query tools is used to complete query with the name of table, database, attribute etc. all mentioned in the data dictionary or other metadata definitions. The suggestions include all the table/value/database that the user has view access to.
Example: SQL Server Management Studio
6. Word processors: Word processors generally use auto complete feature to reduce the repetition of same words/phrases based on current documentation and dataset used for training the application.
Most advanced word processors like Vim and Microsoft Office support this feature.
Microsoft Excel also fills values in the cell based on upper cell values.
7. Command Line interpreters: Command line interpreters use the list of commands the user can use to auto fill the commands. This is generally done with the help of
Most popular CLI having autocomplete feature include bash for Linux and PowerShell for Windows.
2. Spell Checkers/Auto-correct
Spell checking or auto-correct is a three step process:
- Check for the word in the data dictionary
- Generate potential suggestions
- Sort the suggestions with higher priority on top.
Trie data structure is used to store the data dictionary and algorithms for searching the words from the dictionary and provide the list of valid words for suggestion can be constructed.
3. Longest Prefix Matching
Longest Prefix Matching algorithm, also called the Maximum Prefix length match, is used in networking by the routing devices in IP (Internet Protocol) networking. It is used to select an entry from the routing table based on the history of previous packets transferred on the network.
The earliest IP Lookup Technique to employ Trie data structure is the Radix Trie Implementation in BSD Kernel.
Optimization of the network routes required contiguous masking that bounded the complexity of the worst case for lookup time to O(n), where n is the length of the URL address in bits. To speed up the lookup process, Multiple Bit Trie Schemes were developed that performed the lookups of multiple bits faster.
4. Browser History
Web browsers keep track of history of websites visited by the user.
By storing this history, with number of visits to the specific websites as a key value, and organizing it on Trie data structure, the user is given suggestions of the website to visit when the prefix of the previously visited URL is written in the address bar.
References/ Further Reading
- Trie data structure by Vipul Gupta at OpenGenus
- Autocomplete feature using TRIE Data Structure by Harsh Bardhan Mishra at OpenGenus
- X-fast trie by Yash Aggarwal at OpenGenus
- Y Fast trie by Yash Aggarwal at OpenGenus