Soft heap
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
Soft heap is a variant of heap data structure (priority queue). It is a powerful data structure owing to amortized constant time complexity of some basic functions.
Following are the 5 operations that have an amortized constant time complexity
- create(S): Create a new soft heap
- insert(S,x): Insert an element in the soft heap
- meld(S1,S2): Merge the 2 soft heaps to get one
- delete(S,x): Delete an element from the soft heap
- findMin(S): Get the minimum element in the soft heap
It merits to beat the logarithmic bound on the complexity of heap operations. The keys in the soft heap are corrupted meaning that each node in the soft heap has an associated linked list. Each node has a common key which sets the upper bound on the values of the keys that can be inserted in that linked list. A key that is inserted in the soft heap might be inserted in the node's linked list and is said to be corrupted because after insertion in the linked list, it will be known only by the common key for that node. This characteristic gives it the name 'soft' heap. The data structure determines which keys are to be corrupted at runtime.
Corruption is a way to decrease the entropy of the data stored in the data structure, and thus facilitate it's treatment. The entropy is defined as the logarithm (base two) of the number of distinct key assignments. The entropy of the data structure is reduced by artificially raising the values of certain keys. The main idea behind the soft heap is to move items across the data structure not individually but in groups, equivalent to 'car pooling'. This improves speed: during heap updates, items travel together in packets, in order to save time.
The data structure is purely pointer-based. No arrays are used and no numeric assumptions are made on the keys.
The soft heap can be used to compute exact or approximate medians and percentiles optimally. It is also useful for near sorting and for computing minimum spanning trees of general graphs.
Given any mixed sequence of n operations, a soft heap with error rate E (for any 0 < E < 0.5) ensures that, at any time, at most E.n of its items have their keys raised. The amortized complexity of each operation is constant, except for insert, which takes O(log 1/E) time. The soft heap is optimal for any value of E.
Applications
-
The soft heap was designed to especially cater to the problem of minimum spanning trees. Soft heap plays an important part in the fastest deterministic algorithm used to compute minimum spanning tree of a graph. Given a connected graph with n vertices and m weighted edges, the algorithm finds a minimum spanning tree in time O(m a (m,n)), where a is the functional inverse of Ackermann’s function.
-
It can be used maintain averages of certain quantities at all times. As and when new entries are added, the average is automatically updated. All this is possible in amortized constant time.
-
K th largest element: We can use soft heaps to find the k th largest element in an array in linear time.To do this, we first insert n numbers into a soft heap with error rate E = 1/3. Next, we delete almost n/3 smallest numbers and delete them. This can be done by calling findMin and delete functions. The largest number deleted has rank between n/3 and 2n/3. After computing this rank, we remove at least n/3 numbers again. We recurse over the remaining elements in a similar fashion.
-
Approximate sorting: If we are given a set of numbers and a number k such that each element is k positions away from it's actual position after sorting. This can be done by inserting n numbers into a soft heap with error rate E and then deleting the smallest keys repeatedly. Ik is the number of keys deleted earlier whose original values are larger than x where x is the k th deleted number. But x must have been in a corrupted state during those particular Ik earlier deletion. Hence at the total number of keys in a corrupted state,counting over all deletions, is at most En^2, and so the number of inversions Ik is also bounded by En^2.
Pseudo Code
Structure of a node
struct CELL {
int key;
struct CELL *cellNext;
};
struct NODE {
int ckey;
int rank;
struct NODE *next, *child;
struct CELL *nodeHead, *nodeTail;
};
A soft queue Q is a binomial tree with subtrees possibly missing. The binomial tree from which Q is derived is called its master tree. The rank of a node of Q is the number of children of the corresponding node in the master tree. The number of children of the root should be no smaller than floor of rank(root)/2.
Remember, a node in a soft heap can have a linked list associated with it such that all members of the linked list are referenced by a common key. The linked list of a node here called an item list. Each member of the linked list is stored in a cell. The linked list along with the common key and some extra pointers is called the NODE.
struct CELL defines the structure of a cell. key stores the key value and cellNext points to the next cell. struct NODE defines structure of the Node. ckey stores the commom key for that node, rank stores the rank of the node, next and child are pointers to other nodes. These pointers form a double linked list like strcuture. nodeHead points to the head of the item list, nodeTail points to the tail of the item list.
A pointer suffix_min is used to point to the head of the soft heap which is often the minimum element.
Operations
-
Insert : To add a new node, we make an uncorrupted new node and meld it in the existing tree
insert (newkey) { NODE *q; CELL *l; l = (CELL *) malloc (sizeof (CELL)); l -> key = newkey; l -> next = NULL; q -> rank = 0; q -> ckey = newkey; q -> nodeHead = l; q -> nodeTail = l; meld (q); }
-
Meld: To meld 2 heaps H and H', we break apart the heap of lesser rank say H' by melding it's queues with H. To meld a queue of rank k into H, we look for the smallest index i such that rank (r(i) >= k). If rank r(i) > k, we insert the head right before that node. Otherwise, we meld the two queues into one of rank k + 1, by making the root with the larger key a new child of the other root. We repeat the process as long as necessary. Finally, we update the suffix_min pointers to the last head visited.
meld (q) { head *h, *prevhead, *tohead = header->next; NODE *top, *bottom; while (q->rank > tohead->rank) { tohead = tohead->next; } prevhead = tohead->prev; while(q->rank == tohead->rank) { if(tohead->queue->ckey > q->ckey) { top = q; bottom = tohead->queue; } else { top = tohead->queue; bottom = q; } q->ckey = top->ckey; q->rank = top->rank + 1; q->child = bottom; q->next = top; q->nodeHead = top->nodeHead; q->nodeTail = top->nodeTail; tohead = tohead->next; } if (prevhead == tohead->prev) h = new_head(); else h = prevhead->next; h->queue = q; h->rank = q->rank; h->prev = prevhead; h->next = tohead; prevhead->next = h; tohead->prev = h; fix_minlist (h); }
-
findMin: The suffix_min points to head of the soft heap which often the minimum ckey. But sometimes, item-list at that node might be empty. In that case, we must refill the item-list with items present lower down in the queue. To do that,we perform a classical function associated with heaps called sift(), which replaces the empty item-list by another one. If necessary, we iterate on this process until the new item-list at the root is not empty. The function sift attempts to move items up the tree towards the root.
findMin { NODE *sift(), *tmp; int min, childcount; head *h = header->next->suffix_min; while(h->queue->nodeHead == NULL) { tmp = h->queue; childcount = 0; while (tmp->next != NULL) { tmp = tmp->next; childcount ++; } if(childcount,h->rank/2) { h->prev->next = h->next; h->next->prev = h->prev; fix_minlist(h->prev); tmp = h->queue; while (tmp->next != NULL) { meld (tmp->child); tmp = tmp->next; } } else { h->queue = sift(h->queue); if(h->queue->ckey == INFTY) { h->prev->next = h->next; h->next->prev = h->prev; h = h->prev; } fix_minlist (h); } h = header->next->suffix_min; } min = h->queue->nodeHead->key; h->queue->nodeHead = h->queue->nodeHead->next; if (h->queue->nodeHead == NULL) h->queue->nodeTail = NULL; return min; }
-
Delete: In case we have a min heap, we replace the head by the minimum in it's list and adjust the tree again to ensure that the the structure of soft heap is maintained. We return the root node as the result of this operation.
delete() { NODE *tmp; int res,childcount; head *h = header->next->suffix_min; while(h->queue->nodeHead == NULL) { tmp = h->queue; childcount = 0; while (tmp->next != NULL) { tmp = tmp->next; childcount ++; } h = header->next->suffix_min; } res = h->queue->nodeHead->key; h->queue->nodeHead = h->queue->nodeHead->next; if (h->queue->nodeHead == NULL) h->queue->nodeTail = NULL; return res; }
Time complexity
If the soft heap contains n elements, then for an error parameter e, it can have at most n.e elements in the corrupted state that is n.e elements can have their keys raised. Owing to this, the time complexity of all the operations are amortized O(1/loge). Insert operation, particularly has a time complexity of O(1/loge).
Meld operation takes about O(k) time where k is the smaller of the two ranks of the trees which are to be melded. Delete operation is simply an adjustment of pointers which takes constant time followed by a meld operation on the 2 parts. Insert requires us to create a new node which takes constant time followed by a meld operation. findMin requires sift and fixminlist operations too.
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.