Rope: the Data Structure used by text editors to handle large strings


Reading time: 40 minutes | Coding time: 20 minutes

A Rope data structure is a tree data structure which is used to store or manipulate large strings in a more efficient manner. It allows for operations like insertion, deletion, search and random access to be executed faster and much more efficiently in comparison to a traditional String. We will look into this data structure and its various operations and implementation.

This data structure is widely used by softwares such as text editors like Sublime, email systems like Gmail and text buffers to handle large strings efficiently.

Introduction

A rope is a type of binary tree i.e. each node can have maximum of 2 children at the maximum, where every leaf node holds a String or a Substring and the length of the same(Also known as the "weight" for the corresponding leaf node) and every following parent node holds the total sum of the weights of the leaf nodes in its left subtree, which in turn is described as the weight of the said node. In a more simpler form, the weight of a node is the sum of all the leaf nodes that sprout from its immediate left child node.

basic

In this article we will understand the Rope Data Structure along with its comparison to a standard string to determine when or when not to use it, along with its advantages & disadvantages. There is a wide variety of applications for the Rope, however the most common is found in text editors which handle large amounts of text.

Rope Data Structure

A rope is made up of nodes, arranged into a binary tree. As a tree, its starting node is marked as the Root and has a left & right child node which further have their child nodes and those with no children are known as leaf nodes. If we take the number of levels to be h, where h=0 for the root, the maximum possible nodes in a rope can be 2(h+1)-1.

Before diving into the implementation, let us compare this data structure with a string:

Comparison with String

Advantages

  • Ropes enable much faster insertion and deletion of text in comparison to string arrays, on which the operations have a time complexity of O(n).
  • Ropes do not require O(n) extra memory when being operated upon, unlike arrays which require it for copying operations.
  • Ropes do not need large contiguous memory areas like arrays.
  • If the operations performed are non-destructive(i.e the altered content is preserved) it behaves as a persistent data structure allows for multiple undo levels.
  • Stable performance regardless of size.

Disadvantages

  • Occupies greater overall storage space in comparison to a simple string, when not being operated upon, in order to store the parent nodes. However, this trade-off reduces as the string grows.
  • Increased complexity of structure creates a greater risk of bugs.
  • Significantly slower on small strings.

From this, we can derive that ropes are more preferable when the size of our data is large and it is to be modified on a regular basis. Strings perform better on smaller data & are better when there are less operations.

Operations

Search(Index I)

In order to find the character at ith position, we search recursively beginning at the root node.

Find

We follow the premise that if the weight of the current node is lower than the value of i, we subtract the weight from i & move right. If the weight is less than value of i we simply move left. We continue till the point we reach a leaf node.

   function index(RopeNode node, integer i)
        if node.weight < i and exists(node.right) then
            return index(node.right, i - node.weight)
        end
        if exists(node.left) then
            return index(node.left, i)
        end
        return node.[i]string
    end

Average case Time complexity: Θ(N)
Average case Time complexity: Θ(log n)

Concatenation(S1+S2)

A concatenation operation between 2 strings(S1 & S2) is performed by creation of a new root node which has a weight equal to the sum of weights of leaf nodes in S1. This takes time if the tree is already balanced. Since, most of the rope operations need a balanced tree, it might require re-balancing after the operation.

Concat

   function concatenation(ropenode node1, ropenode node2)
       newrope.left = node1
       newrope.right = node2
       newrope.val = length(node1)

Time complexity: Θ(1) to concatenate strings and Θ(log N) to calculate weight of root.

Split(I)

In order to split a string at any given point i, there are 2 major cases we might encounter:

  1. Split point being the last character of a leaf node.
  2. Split point being a middle character of a leaf node.

With the second case, we can reduce it to the much simpler first one by splitting the string at given point into 2 leaf nodes & creating a new parent for these component strings. Follow this with subtracting the weights of the cut off parts from the parent nodes and re-balancing the tree.

Split

Time complexity: Θ(log N)

Insertion(I,S')

In order to insert a string S' between our string, at position i, we simply need to split it at the index from which to insert, and the concatenate the new string followed by the split off part of the original string. The cost of this operation is the sum of the three operations performed.

   function insert(index i, string S')
       newrope = split(i)
       concatenation(S',newrope)
       concatenation(rope,S')

Time complexity: Θ(log N)

Deletion(I,J)

In order to delete a part of string from the middle, split the string at provided indices from ith to i+jth character and then concatenate the strings without the remaining part.

   function delete(index i, index j)
       splitrope = split(j)
       remove = split(i)
       concatenation(rope,splitrope)

Time complexity: Θ(log N)

Report(I,J)

In order to report the string between given points, find the node that contains the starting character and whose weight is greater than the ending index j. Then traverse the tree starting at this node and output all the characters by performing an in-order traversal operation.

Time complexity: Θ(j + log N)

Implementation

Implementation of a Rope from a provided string:

class Rope(object):
#Accepts a string or list of strings and forms it into a rope.
#This is the basic implementation which forms a rope with the input data.
#Create an object and pass the string or list as the argument to the constructor to form a rope.
#Ex - s= Rope('abcxyz') or s = Rope(['abc','def','ghi'])
#Simply use print on the instance of this class in order to print the entire string. Ex - print(s)
    def __init__(self, data=''):
	#check if the input data is string or not
        if isinstance(data, list):
            if len(data) == 0:
                self.__init__()
            elif len(data) == 1:
                self.__init__(data[0])
            else:
                # Round-up division (to match rope arithmetic associativity)
                idiv = len(data) // 2 + (len(data) % 2 > 0)
                self.left = Rope(data[:idiv])
                self.right = Rope(data[idiv:])
                self.data = ''
                self.length = self.left.length + self.right.length
        elif isinstance(data, str):
            self.left = None
            self.right = None
            self.data = data
            self.length = len(data)
        else:
            raise TypeError('Kindly use strings for the purpose')
        # Word iteration
        self.current = self
        #checks if the tree is balanced
        def __eq__(self, other):
            if (self.left and self.right) and (other.left and other.right):
                return self.left == other.left and self.right == other.right
            elif (self.left and self.right) or (other.left and other.right):
                return False
            else:
                return self.data == other.data
        #concatenation of strings
        #forms a new rope with other string and joins the 2 ropes 
        #into a new rope
        def __add__(self, other):
            #ensure that the string being concatenated is of type string
            if isinstance(other, str):
                other = Rope(other)
            r = Rope()
            r.left = self
            r.right = other
            r.length = self.length + other.length
            r.current = self
            return r
        #Fetch the length of string in the specified rope
        def __len__(self):
            if self.left and self.right:
                return len(self.left.data) + len(self.right.data)
            else:
                return(len(self.data))
        #fetch the word present at the specified index
        def __getitem__(self, index):
            #ensure the index specified is an integer
            if isinstance(index, int):
                if self.left and self.right:
                    if index < -self.right.length:
                        subindex = index + self.right.length
                    elif index >= self.left.length:
                        subindex = index - self.left.length
                    else:
                        subindex = index

                    if index < -self.right.length or 0 <= index < self.left.length:
                        return self.left[subindex]
                    else:
                        return self.right[subindex]
                else:
                    return Rope(self.data[index])

                elif isinstance(index, slice):
                    if self.left and self.right:
                        start = index.start
                        if index.start is None:
                            if index.step is None or index.step > 0:
                                head = self.left
                            else:
                                head = self.right
                        elif (index.start < -self.right.length or
                                0 <= index.start < self.left.length):
                            head = self.left
                            if index.start and index.start < -self.right.length:
                                start += self.right.length
                        else:
                            head = self.right
                            if index.start and index.start >= self.left.length:
                                start -= self.left.length
                        stop = index.stop
                        if index.step is None or index.step > 0:
                            if (index.stop is None or
                                -self.right.length <= index.stop < 0 or
                                index.stop > self.left.length):
                                tail = self.right
                                if index.stop and index.stop > self.left.length:
                                    stop -= self.left.length
                            else:
                                if head == self.right:
                                    tail = self.right
                                    stop = 0
                                else:
                                    tail = self.left
                                    if index.stop < -self.right.length:
                                        stop += self.right.length
                        else:
                            if (index.stop is None or
                                    index.stop < (-self.right.length - 1) or
                                    0 <= index.stop < self.left.length):
                                tail = self.left
                                if index.stop and index.stop < (-self.right.length - 1):
                                    stop += self.right.length
                            else:
                                if head == self.left:
                                    tail = self.left
                                    stop = -1   # Or self.left.length - 1 ?
                                else:
                                    tail = self.right
                                    if index.stop >= self.left.length:
                                        stop -= self.left.length

                        # Construct the rope as a binary tree
                        if head == tail:
                            return head[start:stop:index.step]
                        else:
                            if not index.step:
                                offset = None
                            elif index.step > 0:
                                if start is None:
                                    delta = -head.length
                                elif start >= 0:
                                    delta = start - head.length
                                else:
                                    delta = max(index.start, -self.length) + tail.length

                                offset = delta % index.step
                                if offset == 0:
                                    offset = None
                            else:
                                if start is None:
                                    offset = index.step + (head.length - 1) % (-index.step)
                                elif start >= 0:
                                    offset = index.step + min(start, head.length - 1) % (-index.step)
                                else:
                                    offset = index.step + (start + head.length) % (-index.step)

                            if not tail[offset:stop:index.step]:
                                return head[start::index.step]
                            else:
                                return head[start::index.step] + tail[offset:stop:index.step]
                    else:
                        return Rope(self.data[index])
        #Report the characters present inside the rope
        def __repr__(self):
            if self.left and self.right:
                return '{}{} + {}{}'.format('(' if self.left else '',
                                        self.left.__repr__(),
                                        self.right.__repr__(),
                                        ')' if self.right else '')
            else:
                return "Rope('{}')".format(self.data)
        #Check if the current node is a leaf node and print its value
        #otherwise iterate down a step on both sides
        def __str__(self):
            if self.left and self.right:
                return self.left.__str__() + self.right.__str__()
            else:
                return self.data
        #The iterator
        def __iter__(self):
            return self
        #Iteration down the tree
        def __next__(self):
            if self.current:
                if self.left and self.right:
                    try:
                        return next(self.left)
                    except StopIteration:
                        self.current = self.right
                    return next(self.right)
                else:
                    self.current = None
                    return self.data
            else:
                raise StopIteration

        def next(self):
            return self.__next__()

firstRope = Rope('hello_my_name_is')
secondRope = Rope('_rope_data_structure')

#concatenate these strings
fullRope = firstRope.__add__(secondRope)
print(fullRope)
#Gives result hello_my_name_is_rope_data_structure

#Show the character at specified index
print(fullRope.__getitem__(14))
#Displays character "i" which is at index 14

Complexity

  • Worst case time complexity: Θ(n)
  • Average case time complexity: Θ(log N)
  • Best case time complexity: Θ(log N)
  • Space complexity: Θ(n)

Applications

The areas where the Rope data structure is used are:

  • Text Editors which handle large amounts of strings.
  • E-mail messages which might require lot of editing.
  • Edit buffers for handling large text.
  • Cedar programming environment uses ropes since its inception.