×

Search anything:

# Data Structure for Spreadsheet / Excel

#### Algorithms Data Structures tree data structure Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we have explored several Data Structures that are used in Spreadsheet / Excel Sheet. These involve ideas like Sparse Matrix, AVL Tree and much more.

1. Introduction to Spreadsheet / Problem Statement
2. simple bruteforce way
3. a lazy way
4. sparse matrix
5. openoffice's implementation
6. using AVL Tree

Pre-requisites:

## Introduction to Spreadsheet / Problem Statement

If you have used `Excel Sheet` from Microsoft or `Calc` from OpenOffice then you must have seen that the whole Spreadsheet looks like a table,
Where it has columns as `A, B, C, ...` and row as `1, 2, 3, ...`

Let's say you want to make your Spreadsheet Application and you simply took the whole table as a
`2D Array!!`.
The problem with this is that you would be occupying lots of memory that you don't even require because lots of your cells are just empty.

That's why you could use many other ways to store your cells and data without occupying lots of unnecessary memories. And we will be seeing one of those ways in this article, going to go through how to implement it, especially one method that is known to be used by `Excel Sheet` is using Binary Tree.

Notes:
`Access` word refers to accessing a cell in the spreadsheet using its row and column.

##### The Bruteforce Way

Using directly `2D Matrix` would be simple and straightforward to store the whole Spreadsheet because you don't have to every time create or delete a piece of memory.
But this will be a really bruteforced method to store the values since, normally no one will be using that whole reserved memory.
You might have another reason to not use this is that you will be defining a static `2D array`.
But let's say you're using the whole space, and you're not deleting or inserting any new values at random positions, then this would be a great choice ( simple and fast, but for only this case).

###### Inserting

Since, we're defining a static `2D Array` here, therefore, you cannot extend it, but let's say you did define a dynamic array then, you may perform two kinds of insertion, i.e.

1. Inserting a column

Doing this will require you to first decide at what place you want to insert a new column(by index), and then you need to shift the whole array of a column into a new array while creating a new empty row of the same size as the other rows.
Complexity: O(N)

2. Inserting a row

Now this could be done in two ways,

1. insert one row in one column, hence, losing the rule of the same size of the column and same size of the row.
2. insert one row in every column, hence, replacing every column array with a bigger one, So, it could fit a whole new row, while maintaining the Square Matrix.
Complexity: O(N2)

Inserting a cell would shift all other values which will deform the integrity of your data, therefore, inserting a cell in this data structure is not possible.

###### Deleting

Actually while deleting a cell in this `Matrix` will shift the other values unlike the other data structure we're going to see afterward, therefore a cell isn't possible.
But if you need you can delete a column or a row.

1. Deleting a column

When we delete a whole column ( so, deleting one array of rows ), we will require to shift the whole columns after that to fill the place.
Complexity: O(N)

2. Deleting a row

The whole sequence of a row is separated throughout the columns, therefore we would have to delete a row and shift the whole other rows to fill the space for every column.
Complexity: O(N2)

###### Searching

When searching a value in this data structure we would require to look at every (both empty and non-empty) cell.
Therefore, the cells we need to check will be `row * col`.
i.e. Searching Complexity will be: `O(N2)`

###### Complexities
• Space : `O(N2)`
• Insert : `Cannot be done.`
• Delete : `Cannot be done.`
• Search : `O(N2)`
• Access : `O(1)`

`N` is the number of rows/columns considering that the row and column have the same length and form Square Matrix.

##### The Lazy Way

Will discuss this one first because it's simple and easy, and it provides an idea to make it easy to continue to a sparse matrix.
But I'm not sure if it should be used anywhere.

In this approach, whenever their is value available for a cell then we store both the value and the position of the cell together, this could be done using `JSON` as mentioned in above link. Or you could try some of your other ways such as `HashMap` also known as `Dictionary`.

Few examples:
1. In JSON with `Cell` and `Value`. 2. A little variation

Here, we're using HashMap as above, except instead of storing it by Cell (position value), first, we have HashMap for columns, and then each column has another HashMap for a cell.
This solution is more like a variant of `Sparse Matrix` that we will see next. ###### Inserting

If you have used `HashMap` before then you may know that you require a value for your position/index ( to generate a hash value ) and then a value that you want to store at that index.
So, if we're using the above method of nested column and row, then,
Steps to insert a value will be:

1. Check if the column you need exists or not, if it doesn't then make an entry for the column with the position values such as `A, B, C, ...` or you can also use `1, 2, 3, ...`, and create a `HashMap` to store rows.
2. Then Check for the row position value and make Entry if not available.
Complexity: O(1)
###### Deleting

Check like for the Entry as above we did while inserting the value and delete the entry for the row and if there aren't any available rows in that column then also delete the entry for the column.

Complexity: O(1)

###### Searching

Searching will require us to look through all the entries, So:

1. we would have to go through every Entry for rows and check if the value is the one we're searching for
2. and step one would have to be performed on every column.
Complexity: O(N)
###### Complexities
• Space : `O(N)`
• Insert : `O(1)`
• Delete : `O(1)`
• Search : `O(N)`
• Access : `O(1)`

`N` is the number of entries made.

##### Sparse Matrix

The concept of Sparse Matrix is used when most of the element available in a matrix is zero, And when we're dealing with a Spreadsheet then this is the case for most of the time.

In Sparse Matrix instead of storing the values in `2D Array,` we take three different arrays to store the values of the column, row, and the values that reside there.

Three Arrays:

1. Values => Stores the user given values that you see in cells.
2. Row => Stores the row value of the cell.
3. Column => Here every index represents the column, and this array stores the index value at which the rows start, in the Row array.

Something like: Before we proceed, let's assume you have `col` and `row` as your position for `Column` and `Row` respectively and `value` that you need to add on the `Value` array.

###### Inserting

To insert a value in `Sparse Matrix` we will need to alter all three arrays in one order,

• Column
1. First we will check if is there any value available at `Column[col]` that is not `null` or any value that you decided to use to show that "this Column has no value",
2. If `Column[col]` is empty then find the next column that is available take its value, let's call it `row_i` and set `Column[col] = row_i` (skip 5th step).
3. And `Column[col]` have any value then increment values of the `Column[col]` and for every values after that.
• Row
1. Now, if you found not NULL value at `Column[col]`, then go to `Row[Column[col]]` ( let's take `Row[Column[col]]` as `row_i` ).
2. Now, increment `row_i` and find the value which is greater than your `row` or else your `row_i` becomes greater than `Column[next_column]`.
3. Shift the whole values starting from `row_i` to make a space in `Row` array and then do `Row[row_i] = row`.
• Values
1. Now only insertion is remaining, take `row_i` from there shift every value towards the end of the array by 1, and now put the `value` at `Value[row_i]`.

Pseudocode

``````def next_column(pre):
next_col = 0
for i in range(pre, len(Column)):
if Column[i] is not Null:
next_col = Column[i]
break;
return next_col

get col, row, value;
row_i = 0
if Column[col] is Null:
row_i = next_column(col)
Column[col] = row_i
else:
row_i = Column[col]
next_col = next_column(row_i)
for i in range(row_i, Column[next_col] + 1):
if Row[i] > row:
row_i = i
break;
elif i == Column[next_col]:
row_i = Column[next_col] + 1

for i in range(col, len(Column)):
Column[i] += 1 if Column[i] is not Null

Row.insert(row_i, row)
Value.insert(row_i, value)
``````
###### Deleting

Let's look at steps for deleting a value from the `Matrix` here again we would be using the `row` and `col`

• Column
1. Get the `row_i` from `Column[col]`.
2. Now, decrement the values in the `Column` array starting from `col` to the end of `Column`.
• Row
1. Find the row by incrementing `row_i` and checking the value in the Row array until you reach `Column[next_column]`.
2. Now shift all `row` starting from the `row_i` to the end of the `Row` array, towards left by 1.
• Value
1. And then shift all the `value` in the Value array towards the left by 1, from `row_i` to the end of the `Value` array.

Pseudocode

``````get row, col;
row_i = 0

if Column[col] is not Null:
# step 1
row_i = Column[col]

# step 2
for i in range(col, len(Column)):
Column[col] -= 1 if Column[col] is not Null

# step 3
next_col = Column[next_column(col)]
for i in range(row_i, next_col + 1):
if Row[i] == row:
row_i = i
elif i == next_col:
return;

Row.delete(row_i)
Value.delete(row_i)
``````
###### Searching

Searching in `Sparse Matrix` should be possible through the Linear Search.
This time we will be going from the array `Value` and then to the `Row` and then `Column`.

• Value

Here you can perform a Linear Search to find the index at which our `value` resides.

• Row

Now, after we know index for value, we know index for the row as well, therefore, `row = Row[index]`.

• Column

Also, here do Linear Search and find a value that is equal to `index` and return that as a `col`, and if you find a value that is greater than `index` then return the previous `index` which was not `Null`.

Pseudocode

``````get value;

index = 0
row = 0
col = 0

# searching the value
for i in range(len(Value)):
if value == Value[i]:
index = i
# got the row
row = Row[i]
break;

# searching for the col
pre_i = 0
for i in range(len(Column)):
if Column[i] is not Null:
if Column[i] > index:
col = pre_i
break;
elif Column[i] == index:
col = i
break;
pre_i = i

return (col, row)

``````
###### Complexities
• Space : `O(N)`
• Insert : `O(N)`
• Delete : `O(N)`
• Search : `O(N)`
• Access : `O(1)`
##### OpenOffice's Implementation

Mentioning this here only as an example to the Sparse Matrix.

We're not going to go through the whole thing but only the idea that they have used here to store the cells because a lot of things happen in a Spreadsheet than just storing the values like, you have to provide support for multiple kinds of data types and also a possibility to perform the operations regarding them.

If we look at the OpenOffice doc for data model of cell and sheet, we notice that they use 1024 columns and that's the max value for a column, and after this, the column stores only values for non-empty rows.

To imagine all this. You can notice that the column is going in one order, but that is not compulsory with the rows.

### AVL Tree

AVL Tree is a Binary Search Tree with some extra steps, which allows it to balance itself.
I recommend you to read about AVL and BST before you proceed on this topic of using AVL as a data structure for a Spreadsheet.

In this approach we will use Nodes of the AVL Tree store three values in tree `row`, `column` and `value`, and in that we will use the values of `row` and `col` to make comparisons in the AVL Tree.
We can have two variants regarding this:

###### 1. One-Tree

One Tree is not an official name or anything, but just a name popped up in my head while organizing this article.

In this variant, we will store all three `col`, `row` and `value`, in one Node and only one AVL Tree will be created ( therefore, every value will be stored by one AVL Tree).
The comparison between Nodes can be done using both `col` and `row` values while giving more priority to one than the other,

Giving more priority to:

• col

If we look at some values while giving priority to `col` for comparisons, then it looks something like,
A1 < A2 < A3 ... < B1 ... < C1 ...
So, you see no matter what `row` you select in `A` it will always be less than the `Bn`.

• row

Like `col`, if we give priority to `row` then all `A1`, `B1`, `C1`, ... will be less than the `A2`, `B2`, `C2`,...
And if said in short any `x1` will be less than any `x2` and so on.

Well, here I'm not sure what kind of difference there will be in performance, but we're going to choose to give more priority to the column because it would separate every row into different Sub-Tree of a column.

###### 2. Nested-Tree

Another way to do this is to have a different AVL Tree for column and row. Every Node in AVL Tree for column need to contain a pointer to the AVL Tree of rows and this Tree for row contains Cells of the Spreadsheet.
You might notice that structure of this is not different than the above One Tree while we're giving priority to the col.

###### Operations

Now let's discuss how to implement this thing, well, we're going to look at the Nested-Tree, because if you want to implement the One-Tree method then we would need to implement the whole AVL Tree instead of using an already built one or maybe you could try features like such as operator overloading or any feature that a programming language might provide.
Therefore, for now, let's just only focus on Nested one.

###### Inserting a value

For this, we would need `row`, `col`, and `value`
the first step would be, to see if `col` is available or not,

• if it is available then, use the pointer to the Tree of row and insert the Node for the `row` with the `value`
• And if it's not available then, create an AVL Tree with root Node of `row` which contains `value` and then insert the Tree pointer in the `Tree of column`.

pseudocode

``````get row col value

if Columns.has(col):
Rows = Columns.get(col)
Rows.insert(Node(row, Cell(value)))
else:
Rows = Tree()
Rows.insert(Node(row, Cell(value))
Columns.insert(Node(col, Row))
``````
###### Deleting a value

Deleting a value in this is simpler than Inserting a value,
We will only need `row` and `col` of the Cell we need to delete.
First, we need to search the Node for `col` in Columns's Tree and then check if the Rows Tree have only one `row` that we need to delete then, delete the Node for `col` in Columns Tree directly or else only delete the Node Rows's Tree.

Pseudocode

``````get row col
global Columns

if Columns.has(col):
Rows = Columns.get(col)
if Rows.len() == 1:
Columns.delete(col)
else:
Rows.delete(row)
``````
###### Searching a value

Since we're finding a value, therefore, we need the `value` for this operation.
After we have `value` we need to look at every Node in every Rows Tree and check if the Node has the value we need.
This operation would be easier to do, if your AVL Tree is made up of an array since, you have to only iterate through it, else if it's made up of Nodes pointing to their children then you will require more codes to do this.
We will look at both of these.

1. If Tree is Array
Pseudocode
``````get value
global Columns

for col in range(len(Columns)):
for row in range(len(Columns[col].rows)):
if Columns[col].rows[row].value == value:
return(col, row)

``````
1. If Tree is Nodes
Pseudocode
``````get value
global Columns

class Tree:
# This is generator which will run recursively
# provides the next node everytime it's called.
def next(self):
if self.left is not Null:
yield self.left
self.left.next()
if self.right is not Null:
yield self.right
self.right.next()

for _ in range(Columns.len()):
node = Columns.next()
(col, Rows) = node.col, node.rows
for _ in range(Rows.len()):
node = Rows.next()
if value == node.value:
return (col, node.row)
``````
###### Complexities
• Space : `O(N)`
• Insert : `O(logN))`
• Delete : `O(logN)`
• Search : `O(N)`
• Access : `O(logN)`
##### Extras

Well, we're done here, but I will consider the above concepts are hard to understand, and for that, I have written examples in python for all the concepts mentioned above in this repo. #### Shubhankar Maurya

Shubhankar Maurya is pursuing B.Sc in Computer Science from KD College, Mumbai University. He is an Intern at OpenGenus.

Data Structure for Spreadsheet / Excel