Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Reading time: 35 minutes | Coding time: 15 minutes
Y-fast trie is a data structure used to store integers from a bounded domain. It is a bitwise trie, i.e. a binary tree where each subtree stores values having binary representations with common prefix.
It is considered as an upgrade to X-fast tries, allowing more efficient memory usage and faster insertion/deletions.Y-fast trie also treats integers as if they were words of w
bits which allows storing integers as a trie of words.
It is expected to perform five operations:
- find(x): Find x in the given trie.
- predecessor(x): Return the largest element in domain less than or equal to x.
- successor(x): Return the smallest element in domain less greater or equal to x.
- insert(x, v): Insert an element x into trie pointing to value v.
- delete(x): Delete an element x from trie.
Y-fast tries consists of two data structures:
- X-fast tries
- balanced binary search trees
Keys are divided into logU groups of consecutive elements, with U being the universe size. A balanced bst is created for each group, with each group being of atleast logU / 4 elements and atmost 2logU elements.
For every binary search tree, a representative r is chosen and all these representatives are stored in a X-fast trie. r has to be smaller than it's successor in X-fast trie and all elements in the bst represented by the successor, as well as greater than its predecessor and all elements in its bst. r does not necessarily have to be one of the element in the bst it represents.
Image above shows a Y-fast trie with universe size M
.
Y-fast trie starts as a X-fast trie. The major change is that we operate on representative values r
in X-fast tries, and the leaf nodes points to balanced binary search trees rather than values.
Algorithm
Y Fast trie supports the following operations:
- find
- successor and predecessor
- insert
- delete
Find operation
- find(k): Find the element k in trie.
1. Find the predecessor and successor representatives of k in the X-fast
trie, i.e. find largest representative smaller than k and find smallest
representative larger than k in representative X-fast trie.
k can be only in the bst's represented by them.
2. Binary search for k in the binary search trees.
Successor and Predecessor operations
- successor(k) and predecessor(k): Successor is the node with smallest element greater than or equal to k, while predecessor is largest element smaller than or equal to k in trie. This node will have longest common prefix with k.
1. Find the successor and predecessor representatives of k in the X-fast trie.
2. Traverse both trees to get two potential successor (or predecessor) of k.
Select the appropriate of the two options.
Insert operation
- insert(k, v): To insert the key value pair (k, v)
First determine in which balanced bst should k be inserted
1. Find the successor of k. Determine the bst in which the successor is stored.
Let the tree be T.
2. Insert k into tree T, with the node pointing to v.
3. If T has more than 2log2U elements, remove its representative and
split the tree into two balanced bst's. Pick a representative for each tree
and insert them into X-fast trie.
Delete operation
- delete(k):
1. Find the key k and delete it. Let the tree that contained it be T.
2. If the elements count of T goes lower than (log2U)/4, merge tree
T with its predecessor or successor while removing representatives of merged
trees from X-fast trie.
3. Insert representative of new tree into X-fast trie.
4. If the newly formed tree has more than 2log2U elements, remove its
representative and split the tree into two balanced bst's. Pick a representative
for each tree and insert them into X-fast trie.
Complexity
- Time complexity:
- Find: O(1)
- Successor, Predecessor: O(log2log2U)
- Insert: O(log2U)
- Delete: O(log2U)
- Space complexity: O(N log2U)
Where N is count of values stored and U is universe size.
Implementations
C++ 11
/*
/*
* c++ 11 code to demonstrate a Yfast trie. A proper would be
* humongous, so this code should be taken just as an example.
*/
#include <iostream>
#include <unordered_map>
#include <vector>
#include <map>
#include <iterator>
struct Node{
int level;
int key;
// also used by leaf nodes to simulate linked list
Node *left, *right;
Node(){
level = -1;
left = nullptr;
right = nullptr;
}
};
class XfastTrie{
int w;
std::vector<std::unordered_map<int, Node *>> hash_table;
int getDigitCount(int x){
int count = 0;
for(; x > 0; x >>= 1, ++count);
return count;
}
int leftChild(int x){
return (x << 1);
}
int rightChild(int x){
return ((x << 1) | 1);
}
Node * getLeftmostLeaf(Node *parent){
while(parent->level != w){
if(parent->left != nullptr)
parent = parent->left;
else
parent = parent->right;
}
return parent;
}
Node * getRightmostLeaf(Node *parent){
while(parent->level != w){
if(parent->right != nullptr)
parent = parent->right;
else
parent = parent->left;
}
return parent;
}
public:
XfastTrie(){}
XfastTrie(int U){
w = getDigitCount(U);
hash_table.assign(w + 1, std::unordered_map<int, Node *>());
Node *root = new Node();
root->level = 0;
hash_table[0][0] = root;
}
Node* find(int k){
if(hash_table[w].find(k) == hash_table[w].end())
return nullptr;
return hash_table[w][k];
}
Node* successor(int k){
int low = 0,
high = w + 1,
mid, prefix;
Node *tmp = nullptr;
while(high - low > 1){
mid = (low + high) >> 1;
prefix = k >> (w - mid);
if(hash_table[mid].find(prefix) == hash_table[mid].end())
high = mid;
else{
low = mid;
tmp = hash_table[mid][prefix];
}
}
if(tmp == nullptr || tmp->level == 0)
// occours on first insertion
return nullptr;
if(tmp->level == w)
return tmp;
// use descendant node
if((k >> (w - tmp->level - 1)) & 1)
tmp = tmp->right;
else
tmp = tmp->left;
if(tmp->key < k){
return tmp->right;
}
return tmp;
}
Node* predecessor(int k){
// completely same as successor except lst section
int low = 0,
high = w + 1,
mid, prefix;
Node *tmp = nullptr;
while(high - low > 1){
mid = (low + high) >> 1;
prefix = k >> (w - mid);
if(hash_table[mid].find(prefix) == hash_table[mid].end())
high = mid;
else{
low = mid;
tmp = hash_table[mid][prefix];
}
}
if(tmp == nullptr || tmp->level == 0)
// occours on first insertion
return nullptr;
if(tmp->level == w)
return tmp;
// use descendant node
if((k >> (w - tmp->level - 1)) & 1)
tmp = tmp->right;
else
tmp = tmp->left;
if(tmp->key > k){
return tmp->left;
}
return tmp;
}
void insert(int k){
Node *node = new Node();
node->key = k;
node->level = w;
// update linked list
Node *pre = predecessor(k);
Node *suc = successor(k);
if(pre != nullptr){
if(pre->level != w){
std::cout << "Wierd level " << pre->level << '\n';
}
node->right = pre->right;
pre->right = node;
node->left = pre;
}
if(suc != nullptr){
if(suc->level != w){
std::cout << "Wierd level " << suc->level << '\n';
}
node->left = suc->left;
suc->left = node;
node->right = suc;
}
int lvl = 1, prefix;
while(lvl != w){
prefix = k >> (w - lvl);
if(hash_table[lvl].find(prefix) == hash_table[lvl].end()){
Node *inter = new Node();
inter->level = lvl;
hash_table[lvl][prefix] = inter;
if(prefix & 1)
hash_table[lvl - 1][prefix >> 1]->right = inter;
else
hash_table[lvl - 1][prefix >> 1]->left = inter;
}
++lvl;
}
hash_table[w][k] = node;
if(k & 1)
hash_table[w - 1][k >> 1]->right = node;
else
hash_table[w - 1][k >> 1]->left = node;
// update descendant pointers
prefix = k;
lvl = w - 1;
while(lvl != 0){
prefix = prefix >> 1;
if(hash_table[lvl][prefix]->left == nullptr)
hash_table[lvl][prefix]->left = getLeftmostLeaf(hash_table[lvl][prefix]->right);
else if(hash_table[lvl][prefix]->right == nullptr)
hash_table[lvl][prefix]->right = getRightmostLeaf(hash_table[lvl][prefix]->left);
--lvl;
}
if(hash_table[0][0]->left == nullptr){
hash_table[0][0]->left = getLeftmostLeaf(hash_table[0][0]->right);
}
if(hash_table[0][0]->right == nullptr){
hash_table[0][0]->right = getRightmostLeaf(hash_table[0][0]->left);
}
}
};
class BinarySearchTree{
public:
std::map<int, int> tree;
void insert(int k, int val){
tree[k] = val;
}
int successor(int k){
std::map<int, int> :: iterator tmp = tree.lower_bound(k);
if(tmp == tree.end())
return -1;
else
return tmp->first;
}
int predecessor(int k){
std::map<int, int> :: iterator tmp = tree.upper_bound(k);
if(tmp == tree.begin())
return -1;
tmp = std::prev(tmp);
return tmp->first;
}
};
class YfastTrie{
std::unordered_map<int, BinarySearchTree> bst;
XfastTrie xtrie;
int w;
int getDigitCount(int x){
int count = 0;
for(; x > 0; x >>= 1, ++count);
return count;
}
public:
YfastTrie(int u){
w = getDigitCount(u);
xtrie = XfastTrie(u);
}
int find(int k){
Node *suc = xtrie.successor(k);
Node *pre = xtrie.predecessor(k);
if(bst[suc->key].tree.find(k) != bst[suc->key].tree.end())
return bst[suc->key].tree[k];
if(bst[pre->key].tree.find(k) != bst[pre->key].tree.end())
return bst[pre->key].tree[k];
return -1;
}
int successor(int k){
Node *suc = xtrie.successor(k);
Node *pre = xtrie.predecessor(k);
// used as infinite here
int x = 2 << 2, y = 2 << w;
if(suc != nullptr)
x = bst[suc->key].successor(k);
if(pre != nullptr)
y = bst[pre->key].successor(k);
return (x < y) ? x : y;
}
int predecessor(int k){
Node *suc = xtrie.successor(k);
Node *pre = xtrie.predecessor(k);
int x = -1, y = -1;
if(suc != nullptr)
x = bst[suc->key].predecessor(k);
if(pre != nullptr)
y = bst[pre->key].predecessor(k);
return (x > y) ? x : y;
}
void insert(int k, int val){
Node *suc = xtrie.successor(k);
if(suc == nullptr){
xtrie.insert(k);
// representative can be anything. using first element as
// representative requires length of xfast trie to be u.
bst[k] = BinarySearchTree();
bst[k].tree[k] = val;
}
else{
int succ = suc->key;
std::cout << succ << '\n';
bst[succ].tree[k] = val;
}
}
};
int main(){
YfastTrie trie(1 << 5);
std:: cout << "insert 1, 5, 11, 12\n";
trie.insert(5, 2);
trie.insert(11, 2);
trie.insert(12, 2);
trie.insert(1, 2);
std::cout << "Successor of key 2:\n";
int tmp = trie.successor(2);
if(tmp != -1){
std::cout << tmp << '\n'
<< "value stored = " << trie.find(tmp) << '\n';
}
std::cout << "Predecessor of key 13:\n";
tmp = trie.predecessor(13);
if(tmp != -1){
std::cout << tmp << '\n'
<< "value stored = " << trie.find(tmp) << '\n';
}
return 0;
}
Applications
- Y-fast tries are used extensively in database systems.
- Y-fast tries are extremely fast associative array.
- Y-fast tries have replaced X-fast tries in almost all applications, for example, IP routing.
References/ Further reading
-
Paper : Fast Local Searches and Updates in Bounded Universes.
-
The implementation presented above was inefficient and needs a balanced Binary Search Tree implemented. Try this as an exercise.