Y-fast trie

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Reading time: 35 minutes | Coding time: 15 minutes

Y-fast trie is a data structure used to store integers from a bounded domain. It is a bitwise trie, i.e. a binary tree where each subtree stores values having binary representations with common prefix.

It is considered as an upgrade to X-fast tries, allowing more efficient memory usage and faster insertion/deletions.Y-fast trie also treats integers as if they were words of w bits which allows storing integers as a trie of words.

It is expected to perform five operations:

find(x): Find x in the given trie.
predecessor(x): Return the largest element in domain less than or equal to x.
successor(x): Return the smallest element in domain less greater or equal to x.
insert(x, v): Insert an element x into trie pointing to value v.
delete(x): Delete an element x from trie.

Y-fast tries consists of two data structures:

X-fast tries
balanced binary search trees

Keys are divided into logU groups of consecutive elements, with U being the universe size. A balanced bst is created for each group, with each group being of atleast logU / 4 elements and atmost 2logU elements.

For every binary search tree, a representative r is chosen and all these representatives are stored in a X-fast trie. r has to be smaller than it's successor in X-fast trie and all elements in the bst represented by the successor, as well as greater than its predecessor and all elements in its bst. r does not necessarily have to be one of the element in the bst it represents.

Image above shows a Y-fast trie with universe size M.
Y-fast trie starts as a X-fast trie. The major change is that we operate on representative values r in X-fast tries, and the leaf nodes points to balanced binary search trees rather than values.

Algorithm

Y Fast trie supports the following operations:

find
successor and predecessor
insert
delete

Find operation

find(k): Find the element k in trie.

1. Find the predecessor and successor representatives of k in the X-fast
    trie, i.e. find largest representative smaller than k and find smallest
    representative larger than k in representative X-fast trie.
    k can be only in the bst's represented by them.
2. Binary search for k in the binary search trees.

Successor and Predecessor operations

successor(k) and predecessor(k): Successor is the node with smallest element greater than or equal to k, while predecessor is largest element smaller than or equal to k in trie. This node will have longest common prefix with k.

1. Find the successor and predecessor representatives of k in the X-fast trie.
2. Traverse both trees to get two potential successor (or predecessor) of k.
    Select the appropriate of the two options.

Insert operation

insert(k, v): To insert the key value pair (k, v)

First determine in which balanced bst should k be inserted
1. Find the successor of k. Determine the bst in which the successor is stored.
    Let the tree be T.
2. Insert k into tree T, with the node pointing to v.
3. If T has more than 2log₂U elements, remove its representative and
    split the tree into two balanced bst's. Pick a representative for each tree
    and insert them into X-fast trie.

Delete operation

delete(k):

1. Find the key k and delete it. Let the tree that contained it be T.
2. If the elements count of T goes lower than (log₂U)/4, merge tree
    T with its predecessor or successor while removing representatives of merged
    trees from X-fast trie.
3. Insert representative of new tree into X-fast trie.
4. If the newly formed tree has more than 2log₂U elements, remove its
    representative and split the tree into two balanced bst's. Pick a representative
    for each tree and insert them into X-fast trie.

Complexity

Time complexity:
1. Find: O(1)
2. Successor, Predecessor: O(log₂log₂U)
3. Insert: O(log₂U)
4. Delete: O(log₂U)
Space complexity: O(N log₂U)

Where N is count of values stored and U is universe size.

Implementations

C++ 11

/*
/*
 * c++ 11 code to demonstrate a Yfast trie. A proper would be
 * humongous, so this code should be taken just as an example.
 */

#include <iostream>
#include <unordered_map>
#include <vector>
#include <map>
#include <iterator>

struct Node{
    int level;

    int key;

    // also used by leaf nodes to simulate linked list
    Node *left, *right;

    Node(){
        level = -1;
        left = nullptr;
        right = nullptr;
    }
};

class XfastTrie{
    int w;
    std::vector<std::unordered_map<int, Node *>> hash_table;
    
    int getDigitCount(int x){
        int count = 0;
        for(; x > 0; x >>= 1, ++count);
        return count;
    }

    int leftChild(int x){
        return (x << 1);
    }

    int rightChild(int x){
        return ((x << 1) | 1);
    }

    Node * getLeftmostLeaf(Node *parent){
        while(parent->level != w){
            if(parent->left != nullptr)
                parent = parent->left;
            else
                parent = parent->right;
        }
        return parent;
    }

    Node * getRightmostLeaf(Node *parent){
        while(parent->level != w){
            if(parent->right != nullptr)
                parent = parent->right;
            else
                parent = parent->left;
        }
        return parent;
    }

public:
    XfastTrie(){}

    XfastTrie(int U){
        w = getDigitCount(U);
        hash_table.assign(w + 1, std::unordered_map<int, Node *>());
        Node *root = new Node();
        root->level = 0;
        hash_table[0][0] = root;

    }

    Node* find(int k){
        if(hash_table[w].find(k) == hash_table[w].end())
            return nullptr;
        return hash_table[w][k];
    }

    Node* successor(int k){
        int low = 0,
            high = w + 1,
            mid, prefix;

        Node *tmp = nullptr;
        while(high - low > 1){
            mid = (low + high) >> 1;
            prefix = k >> (w - mid);
            if(hash_table[mid].find(prefix) == hash_table[mid].end())
                high = mid;
            else{
                low = mid;
                tmp  = hash_table[mid][prefix];
            }
        }

        if(tmp == nullptr || tmp->level == 0)
            // occours on first insertion
            return nullptr;
     
        if(tmp->level == w)
            return tmp;

        // use descendant node
        if((k >> (w - tmp->level - 1)) & 1)
            tmp = tmp->right;
        else
            tmp = tmp->left;

        if(tmp->key < k){
            return tmp->right;
        }
        return tmp;
    }

    Node* predecessor(int k){
        // completely same as successor except lst section
        int low = 0,
            high = w + 1,
            mid, prefix;

        Node *tmp = nullptr;
        while(high - low > 1){
            mid = (low + high) >> 1;
            prefix = k >> (w - mid);
            if(hash_table[mid].find(prefix) == hash_table[mid].end())
                high = mid;
            else{
                low = mid;
                tmp  = hash_table[mid][prefix];
            }
        }

        if(tmp == nullptr || tmp->level == 0)
            // occours on first insertion
            return nullptr;
     
        if(tmp->level == w)
            return tmp;

        // use descendant node
        if((k >> (w - tmp->level - 1)) & 1)
            tmp = tmp->right;
        else
            tmp = tmp->left;
        
        if(tmp->key > k){
            return tmp->left;
        }
        return tmp;
    }

    void insert(int k){
        Node *node = new Node();
        node->key = k;
        node->level = w;
   
        // update linked list
        Node *pre = predecessor(k);
        Node *suc = successor(k);
        if(pre != nullptr){
            if(pre->level != w){
                std::cout << "Wierd level " << pre->level << '\n';
            }
            node->right = pre->right;
            pre->right = node;
            node->left = pre;
        }
        if(suc != nullptr){
            if(suc->level != w){
                std::cout << "Wierd level " << suc->level << '\n';
            }
            node->left = suc->left;
            suc->left = node;
            node->right = suc;
        }

        int lvl = 1, prefix;
        while(lvl != w){
            prefix = k >> (w - lvl);
            if(hash_table[lvl].find(prefix) == hash_table[lvl].end()){
                Node *inter = new Node();
                inter->level = lvl;
                hash_table[lvl][prefix] = inter;
                if(prefix & 1)
                    hash_table[lvl - 1][prefix >> 1]->right = inter;
                else
                    hash_table[lvl - 1][prefix >> 1]->left = inter;
            }
            ++lvl;
        }
        hash_table[w][k] = node;
        if(k & 1)
            hash_table[w - 1][k >> 1]->right = node;
        else
            hash_table[w - 1][k >> 1]->left = node;

        // update descendant pointers
        prefix = k;
        lvl = w - 1;
        while(lvl != 0){
            prefix = prefix >> 1;
            if(hash_table[lvl][prefix]->left == nullptr)
                hash_table[lvl][prefix]->left = getLeftmostLeaf(hash_table[lvl][prefix]->right);
            else if(hash_table[lvl][prefix]->right == nullptr)
                hash_table[lvl][prefix]->right = getRightmostLeaf(hash_table[lvl][prefix]->left);
            --lvl;
        }
        if(hash_table[0][0]->left == nullptr){
            hash_table[0][0]->left = getLeftmostLeaf(hash_table[0][0]->right);
        }
        if(hash_table[0][0]->right == nullptr){
            hash_table[0][0]->right = getRightmostLeaf(hash_table[0][0]->left);
        }
    }
};

class BinarySearchTree{
public:
    std::map<int, int> tree;

    void insert(int k, int val){
        tree[k] = val;
    }

    int successor(int k){
        std::map<int, int> :: iterator tmp = tree.lower_bound(k);
        if(tmp == tree.end())
            return -1;
        else
            return tmp->first;
    }

    int predecessor(int k){
        std::map<int, int> :: iterator tmp = tree.upper_bound(k);
        if(tmp == tree.begin())
            return -1;
        tmp = std::prev(tmp);
        return tmp->first;
    }
};

class YfastTrie{
    std::unordered_map<int, BinarySearchTree> bst;
    XfastTrie xtrie;
    int w;

    int getDigitCount(int x){
        int count = 0;
        for(; x > 0; x >>= 1, ++count);
        return count;
    }

public:
    YfastTrie(int u){
        w = getDigitCount(u);
        xtrie = XfastTrie(u);
    }

    int find(int k){
        Node *suc = xtrie.successor(k);
        Node *pre = xtrie.predecessor(k);

        if(bst[suc->key].tree.find(k) != bst[suc->key].tree.end())
            return bst[suc->key].tree[k];
        if(bst[pre->key].tree.find(k) != bst[pre->key].tree.end())
            return bst[pre->key].tree[k];

        return -1;
    }

    int successor(int k){
        Node *suc = xtrie.successor(k);
        Node *pre = xtrie.predecessor(k);
        // used as infinite here
        int x = 2 << 2, y = 2 << w;     

        if(suc != nullptr)
            x = bst[suc->key].successor(k);
        if(pre != nullptr)
            y = bst[pre->key].successor(k);

        return (x < y) ? x : y;
    }

    int predecessor(int k){
        Node *suc = xtrie.successor(k);
        Node *pre = xtrie.predecessor(k);
        int x = -1, y = -1;
        if(suc != nullptr)
            x = bst[suc->key].predecessor(k);
        if(pre != nullptr)
            y = bst[pre->key].predecessor(k);

        return (x > y) ? x : y;
    }

    void insert(int k, int val){
        Node *suc = xtrie.successor(k);
        if(suc == nullptr){
            xtrie.insert(k);
            // representative can be anything. using first element as
            // representative requires length of xfast trie to be u.
            bst[k] = BinarySearchTree();
            bst[k].tree[k] = val;
        }
        else{
            int succ = suc->key;
            std::cout << succ << '\n';
            bst[succ].tree[k] = val;
        }
    }
};

int main(){
    YfastTrie trie(1 << 5);
    std:: cout << "insert 1, 5, 11, 12\n";
    trie.insert(5, 2);
    trie.insert(11, 2);
    trie.insert(12, 2);
    trie.insert(1, 2);

    std::cout << "Successor of key 2:\n";
    int tmp = trie.successor(2);
    if(tmp != -1){
        std::cout << tmp << '\n'
                << "value stored = " << trie.find(tmp) << '\n';
    }

    std::cout << "Predecessor of key 13:\n";
    tmp = trie.predecessor(13);
    if(tmp != -1){
        std::cout << tmp << '\n'
                << "value stored = " << trie.find(tmp) << '\n';
    }

    return 0;
}

Applications

Y-fast tries are used extensively in database systems.
Y-fast tries are extremely fast associative array.
Y-fast tries have replaced X-fast tries in almost all applications, for example, IP routing.

References/ Further reading

Paper : Fast Local Searches and Updates in Bounded Universes.
The implementation presented above was inefficient and needs a balanced Binary Search Tree implemented. Try this as an exercise.

Y fast trie

Data Structures tree data structure trie y fast trie x fast trie