Trie Data Structure in Rust

Free Linux Book

Get FREE domain for 1st year and build your brand new site

Let's go back into Data structures for a while. Enough theory, let's code something ourselves :D

Table of contents

  1. What is a Trie?
  2. Initial ideas for implementation
  3. Actual Implementation

To revise the operations in Trie Data Structure in general, go through this article. Following it, you may continue with this article where we implement Trie Data Structure in Rust Programming Language.

What is a Trie?

A Trie, also called a digital tree or a prefix tree is a data structure, resembling the Binary Tree we've already covered in another article. What makes this one stand out is that it's designed to make it easier to locate specific keys withing a set. These sets are often strings, where each node in the tree is a character in these strings, linked together to form words, as you move through it depth-first.

Here's an example:

Trie-Example

A quick search on Crates.io reveals a whole bunch of implementations of a trie. But for Practice, we shall implement it ourselves.

Trie-in-Crates-IO

Initial ideas for implementation

At the bare minimum, our trie should support searching, inserting and deleting strings from it, and ideally be O(m) in complexity, where 'm' is the length of the string we're using.

So, let's try to figure this out. A trie is a tree. Nodes, and links between them. So we should start there. First we'll have a node structure, which will hold a character, a boolean to identify if it's the last character in a string, and a list of references to it's child nodes. Then our proper Trie Structure which will hold the root node, as it is the starting point of all our searching and insertions.

With this in mind, this is going to be our basic program structure:

pub struct TrieNode {
    value: char,
    is_final: bool,
    child_nodes: Vec<Box<TrieNode>>,
}

impl TrieNode {
    // Create new node
    pub fn create(c: char, is_final: bool) -> TrieNode {}
    // Check if a node has that value
    pub fn check_value(self, c: char) -> bool {
        return true;
    }
}

struct TrieStruct {
    root_node: TrieNode,
}

impl TrieStruct {
    // Insert a string
    pub fn insert(string_val: String) {}
    // Find a string
    pub fn find(string_val: String) -> bool {
        return true;
    }
}

fn main() {
    // Create Trie
    // Insert Stuff
    // Find Stuff
}

Let's use this basic structure and start implementing the functions one by one. First let's examine our basic structures. We have some improvements we could make.

use std::collections::HashMap;

pub struct TrieNode {
    value: Option<char>,
    is_final: bool,
    child_nodes: HashMap<char, TrieNode>,
}

We're replacing our Value for an option, to have the chance of having a null value (Which indicates our root node.). Then we're turning our child_nodes from a Vec, to a HashMap. Why? Well, it's faster to access a map with a key, than going character by character in a vector, which could become problematic.

Next, let us implement some basic functionality for our TrieNodes

I think they are self-explanatory so..

impl TrieNode {
    // Create new node
    pub fn new(c: char, is_final: bool) -> TrieNode {
        TrieNode {
            value: Option::Some(c),
            is_final: is_final,
            child_nodes: HashMap::new(),
        }
    }

    pub fn new_root() -> TrieNode {
        TrieNode {
            value: Option::None,
            is_final: false,
            child_nodes: HashMap::new(),
        }
    }

    // Check if a node has that value
    pub fn check_value(self, c: char) -> bool {
        self.value == Some(c)
    }

    pub fn insert_value(&mut self, c: char, is_final: bool) {
        self.child_nodes.insert(c, TrieNode::new(c, is_final));
    }
}

Then we have our Trie Struct itself, with it's implementation (It's not the best, it's just my approach, you can look for ways to improve it as "homework" :D)

#[derive(Debug)]
struct TrieStruct {
    root_node: TrieNode,
}

impl TrieStruct {
    // Create a TrieStruct
    pub fn create() -> TrieStruct {
        TrieStruct {
            root_node: TrieNode::new_root(),
        }
    }

    // Insert a string
    pub fn insert(&mut self, string_val: String) {
        let mut current_node = &mut self.root_node;
        let char_list: Vec<char> = string_val.chars().collect();
        let mut last_match = 0;

        for letter_counter in 0..char_list.len() {
            if current_node
                .child_nodes
                .contains_key(&char_list[letter_counter])
            {
                current_node = current_node
                    .child_nodes
                    .get_mut(&char_list[letter_counter])
                    .unwrap();
            } else {
                last_match = letter_counter;
                break;
            }
            last_match = letter_counter + 1;
        }

        if last_match == char_list.len() {
            current_node.is_final = true;
        } else {
            for new_counter in last_match..char_list.len() {
                println!(
                    "Inserting {} into {}",
                    char_list[new_counter],
                    current_node.value.unwrap_or_default()
                );
                current_node.insert_value(char_list[new_counter], false);
                current_node = current_node
                    .child_nodes
                    .get_mut(&char_list[new_counter])
                    .unwrap();
            }
            current_node.is_final = true;
        }
    }

    // Find a string
    pub fn find(&mut self, string_val: String) -> bool {
        let mut current_node = &mut self.root_node;
        let char_list: Vec<char> = string_val.chars().collect();

        for counter in 0..char_list.len() {
            if !current_node.child_nodes.contains_key(&char_list[counter]) {
                return false;
            } else {
                current_node = current_node
                    .child_nodes
                    .get_mut(&char_list[counter])
                    .unwrap();
            }
        }
        return true;
    }
}

The insertion works like this. We grab the string we want to insert and split it into a vector of characters. After that, we go through the tree, checking for existing characters that may be saved. Once we find a node that does not have the next character in our list, we save that index as last_match. After that it's just a matter of iterating from our last match, till the end of the list, and inserting our characters into the proper child nodes.

Last but not least, our main function, to test our structure

fn main() {
    // Create Trie
    let mut trie_test = TrieStruct::create();

    // Insert Stuff
    trie_test.insert("Test".to_string());
    trie_test.insert("Tea".to_string());
    trie_test.insert("Background".to_string());
    trie_test.insert("Back".to_string());
    trie_test.insert("Brown".to_string());

    // Find Stuff
    println!(
        "Is Testing in the trie? {}",
        trie_test.find("Testing".to_string())
    );
    println!(
        "Is Brown in the trie? {}",
        trie_test.find("Brown".to_string())
    );
}

The result of running this, is the following:
Trie-Result

The presentation is a little scuffed but it shows what's going on. Each letter is inserted into the corresponding node.

Disclaimer: You might be noticing we're missing a delete function. Indeed we are. The way Rust works makes it really really hard to build functions that iterate over certain types of data structures while mutating the nodes themselves. I encourage you to take this code and try to add a remove function. You will fight the ownership model and borrow checker for hours (Or 4 days in my case).

Certain niche data structures such as this shouldn't really be used, or re-made (Someone else has made them, use that. Remember what I said in another one of my articles? Don't reinvent the wheel!)

That's it for today folks. Sorry this article took so long to write. I got stubborn and asked for help way too late.