Column Sort Algorithm is a non-traditional sorting algorithm for Distributed Memory Clusters (DMC) (means multiple processors). It is a generalization of odd-even merge sort and is used in a parralel system where multiple CPUs are available for use.

The time complexity of Column Sort Algorithm is O(N logN / P) where N is the number of elements and P is the number of different processors. There are further restrictions which we have explained further in this article at OPENGENUS.

Column Sort was developed by **Frank Thomson Leighton**, CEO of Akamai Technologies and Professor at Massachusetts Institute of Technology (MIT).

# Motivation

Why another sorting algorithm, when we already know that the mergesort is provably most efficient sorting complexity we can achieve?

Column Sort makes use of multiprocessing/multithreading feature of computers. We can use multithreading/multiprocessing in normal sorting algorithms, but it proves to be slower, due to high(really high!) cache misses, and thus drastically increased RAM to Cache updates, or in worse, huge dataset, even Disk to RAM updates.

Real world implementation can be in places where dataset is tremendously huge, anywhere where sorting is required.

As per my experience, it can be used in sorting of (error,generation) vectorof Genetic Algorithms.

Other than this, it can be used as underlying sorting algorithm for databases.

Note that this algorithm is based on the idea of using less time by optimization of the usage of resources available.

# Steps of Column Sort Algorithm

The steps of Column Sort Algorithm are:

- Step 1 : Sort Each column individually
- Step 2 : Transpose
- Step 3 : Sort Each column individually
- Step 4 : Untranspose
- Step 5 : Sort Each Column individually
- Step 6 : Shift
- Step 7 : Sort Each column individually
- Step 8 : Unshift

Given below will be a stepwise explaination of the column-sort algorithm, and the proof of its correctedness will be provided under a seperate heading.

## Step 1 : Sort Each column individually

Here we can opt for the following optimizations :

- Use optimized sorting techniques, such as mergesort/quicksort, over easier selectionsort or bubblesort.
- Use multi-threading to sort each column in a seperate thread.

## Step 2 : Transpose

Use a different variant of transpose of matrix, in which we read the elements in a `Column Major Form`

and then write back to the matrix in a `Row Major Form`

## Step 3 : Sort Each column individually

Repeat the step 1 on the new matrix

## Step 4 : Untranspose

This is the reverse of the step 2. In this, we read the elements of the matrix in `Row Major Form`

and write back to the `Column Major Form`

.

## Step 5 : Sort Each Column individually

Again.

## Step 6 : Shift

We perform this step as a series of 3 sub-steps

Convert the 2-D matrix (r,c) into a 1-D vector, by reading elements

`Column Major Fashion`

.

Add (r/2) (lower integer limit) elements on one side of this 1-D matrix and (r/2) (upper integer limit) on one side. Then, we make the elements added to the front side (from 0 index 0,1,2....) as`-infinity`

and make the elements added to the other side as`+infinity`

Trasform the 1-D vector to a 2-D matrix again

Note that in doing so, our (r,c) matrix becomes (r,c+1) matrix

## Step 7 : Sort Each column individually

Last time.

## Step 8 : Unshift

Here, we reverse the steps performed in the step 6, by again making the 2-D Matrix to a 1-D vector, and remove the `infinities`

and retransform the 1-D vector to a 2-D matrix of dimensions (r,c) again.

# Code/ Implementation

Following is the implementation of Column Sort Algorithm in C++:

```
// Part of OpenGenus
// Author: Varul Srivastava
#include<vector>
#include<iostream>
using namespace std;
# define ll long long
# define POS_INFINITY 10000000000000L
# define NEG_INFINITY -10000000000000L
vector<vector<ll>> transpose(vector<vector<ll>> &matrix){
int r = matrix.size();
int c = matrix[0].size();
vector<vector<ll>> ret_mat(r,vector<ll> (c,0));
int c_row = 0,c_col = 0;
// check for row major and column major forms are correct or not??
for(int x=0;x<r;x++){
for(int y=0;y<c;y++){
ret_mat[c_row][c_col] = matrix[x][y];
c_col++;
if(c_col == c){
c_col = 0;
c_row++;
}
}
}
return ret_mat;
}
// Source: iq.opengenus.org
vector<vector<ll>> untranspose(vector<vector<ll>> &matrix){
int r = matrix.size();
int c = matrix[0].size();
vector<vector<ll>> ret_mat(r,vector<ll> (c,0));
int c_row = 0,c_col = 0;
// check for row major and column major forms are correct or not??
for(int y=0;y<c;y++){
for(int x=0;x<r;x++){
ret_mat[c_row][c_col] = matrix[x][y];
c_row++;
if(c_row == r){
c_row = 0;
c_col++;
}
}
}
return ret_mat;
}
vector<vector<ll>> shift(vector<vector<ll>> &matrix){
int r = matrix.size();
int c = matrix[0].size();
vector<ll> ret_mat(r*(c+1), 0);
int ul = r/2;
int ll = r/2 + (r%2);
for(int x=0;x<ul;x++){
ret_mat[x] = NEG_INFINITY;
}
for(int y=0;y<c;y++){
for(int x=0;x<r;x++){
ret_mat[ul+(y*r+x)] = matrix[x][y];
}
}
for(int x=0;x<ll;x++){
ret_mat[ul+r*c+x] = POS_INFINITY;
}
matrix = vector<vector<ll>> (r,vector<ll>(c,0));
for(int x=0;x<ret_mat.size();x++){
matrix[x%r][x/r] = ret_mat[x];
}
return matrix;
}
// make unshift operation
vector<vector<ll>> unshift(vector<vector<ll>> &matrix){
// do unshift operation code
}
void sort_matrix(vector<vector<ll>> &matrix){
for(auto &x : matrix){
sort(x.begin(), x.end()); // C++ inbuilt mergesort function
}
// passed by reference so no need to return
}
void perform(vector<vector<ll>> &matrix){
sort_matrix(matrix);
matrix = transpose(matrix);
sort_matrix(matrix);
matrix = untranspose(matrix);
sort_matrix(matrix);
matrix = shift(matrix);
sort_matrix(matrix);
matrix = unshift(matrix);
}
```

# Time and Space Complexity

The time complexity of Column Sort Algorithm is O(N logN / P) where N is the number of elements and P is the number of different processors.

The space complexity is O(N).

Column Sort performs best on P processors when N > 2 * P * (P-1)^2.

Alternatives of Column Sort include:

- Sample Sort
- Rotate Sort
- Subblock Column Sort
- Slabpose Sort

When N > 2 * P * (P-1)^2, Column Sort is not used in practice as the constant factor grows exponentially as N decreases.

To give you an idea, if we have 10 processors, then N should be greater than 1620 to get reasonably good performance. With this, you have a good idea of Column Sort.