Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Reading time: 30 minutes | Coding time: 15 minutes
Tim Sort is a hybrid stable sorting algorithm that takes advantage of common patterns in data, and utilizes a combination of an improved Merge sort and Binary Insertion sort along with some internal logic to optimize the manipulation of large scale real-world data. Tim Sort was first implemented in 2002 by Tim Peters for use in Python.
Operation
Tim sort uses Binary insertion sort and improved merge sort by using galloping in a combination. Binary insertion sort is the best method to sort when data is already or partially sorted or the length of run is smaller than MIN_RUN and merge sort is best when the input is large.
Tim Sort is complex, even by algorithmic standards. The operation is best broken down into parts:
Run (dividing slot)
An input array is divided into different sub-arrays, count of elements inside a sub-array is defined as a RUN, the minimum value of such runs is a MIN_RUN which is a power of 2 not more than 32 (or 64). These sub-arrays are usually partially ordered (strictly ascending or stricly descending).When the array is decomposed into several runs, the runs of ascending order remain unchanged, and the runs of strictly descending order are reversed. Finally, several runs of ascending order are obtained.
Example:
Consider an array of goals:
1 | 5 | 9 | 8 | 6 | 4 | 5 | 6 | 7 |
---|
We can see that [1, 5, 9] conforms to ascending order, [8, 6, 4] conforms to strict descending order, [4, 5, 6, 7] conforms to ascending order.
Flip the strictly descending run and then add together a new array:
1 | 5 | 9 | 6 | 8 | 4 | 5 | 6 | 7 |
---|
Merge run
Let's assume that the new array is:
.... | 6 | 7 | 8 | 9 | 10 | 1 | 2 | 3 | 4 | .... |
---|
If we merge two runs using Mergesort alone
run1: [6, 7, 8, 9, 10]
run2: [1, 2, 3, 4, 5]
Mergesort compares the first element of the two runs
1 < 6, get 1
2 < 6, get 2
3 < 6, get 3
4 < 6, get 4
5 < 6, get 5
We found that traversing the entire run directly would eventually consume more number of merge operations if the run was longer. Because every run is in ascending order, there's no need to compare them one by one and the concept of Galloping becomes handy.
Galloping
For example, to merge the following two runs:
run1: [101, 102, 103, ... 200]
run2: [1, 2, 3, ..., 100]
Instead of comparing the elements one by one, we compare them by increasing powers of 2^n where n >= 0.
run1[0] > run2[0]
run1[0] > run2[1]
run1[0] > run2[3]
run1[0] > run2[7]
run1[0] > run2[15]
...
run1[0] > run2[2^n - 1]
run1[0] <= run2[2^(n+1) - 1]
We get a result run2 [2 ^ n - 1] < run1[0] <= run2 [2 ^ (n + 1) - 1], Since run2 is sorted, we use Binary Search to locate run1[0] in run2 very efficiently by defining left = run2[2^n - 1], right = run[2^(n+1) - 1]. Hence. the number of times we merge two runs is reduced from O(N) to O(logN).
Why not skip Galloping and just do Binary Search?
Let's give an example:
run1: [1, 3, 5, 7, 9 ... 2n+1]
run2: [0, 2, 4, 6, 8 ... 2n]
In this way, run1[0] is only larger than run2[0]. Each time we do Binary Search, we can only get a number of results, plus n times of Binary Search, so the time complexity changes from O(N) to O(NlogN).
Stack determines the order
When the original array becomes a set of various ascending runs, we need to merge two runs, but if we merge a long run with a shorter run, it will take a longer time to compare a comparitively shorter run. So Timsort maintains a stack with all the run lengths on the stack, while satisfying that the length of run on the back stack is longer than the sum of the length of run on the first two stacks, so that the length of run always decreases.
Stack : [... runA, runB, runC]
runA > runB + runC
runB > runC
This avoids run merges with too much difference in length.
For those who prefer bullets:
- Establish a minrun size that is a power of 2 (usually 32, never more than 64 or your Binary Insertion Sort will lose efficiency)
- Find a run in the first minrun of data.
- If the run is not at least minrun in length, use Insertion Sort to grab subsequent or prior items and insert them into the run until it is the correct minimum size.
- Repeat until the entire array is divided into sorted subsections.
- Use the latter half of Merge Sort to join the ordered arrays.
Complexity Analysis
By design TimSort is well suited for partially sorted data with the best case being totally sorted data. It falls into the adaptive sort family. Taking the number of runs Ļ as a (natural) parameter for a refined analysis we obtained:
TimSort runs in O(N + NlogĻ) time.
- Worst case time complexity:
Ī(NlogN)
- Average case time complexity:
Ī(NlogN)
- Best case time complexity:
Ī(N)
- Space complexity:
Ī(N)
Implementation
Implementation of Tim Sort algorithm:
- C
- C++
- Java
- Python
C
#include<stdio.h>
#define min(a,b) (a)<(b)?(a):(b)
const int RUN = 32;
// this function sorts array from left index to
// right index which is of size atmost RUN
void insertionSort(int arr[], int left, int right)
{
for (int i = left + 1; i <= right; i++)
{
int temp = arr[i];
int j = i - 1;
while (j >= left && arr[j] > temp)
{
arr[j+1] = arr[j];
j--;
}
arr[j+1] = temp;
}
}
// merge function merges the sorted runs
void merge(int arr[], int l, int m, int r)
{
// original array is broken in two parts
// left and right array
int len1 = m - l + 1, len2 = r - m;
int left[len1], right[len2];
for (int i = 0; i < len1; i++)
left[i] = arr[l + i];
for (int i = 0; i < len2; i++)
right[i] = arr[m + 1 + i];
int i = 0;
int j = 0;
int k = l;
// after comparing, we merge those two array
// in larger sub array
while (i < len1 && j < len2)
{
if (left[i] <= right[j])
{
arr[k] = left[i];
i++;
}
else
{
arr[k] = right[j];
j++;
}
k++;
}
// copy remaining elements of left, if any
while (i < len1)
{
arr[k] = left[i];
k++;
i++;
}
// copy remaining element of right, if any
while (j < len2)
{
arr[k] = right[j];
k++;
j++;
}
}
void timSort(int arr[], int n)
{
// Sort individual subarrays of size RUN
for (int i = 0; i < n; i+=RUN)
{
insertionSort(arr, i, min((i+31), (n-1)));
}
// start merging from size RUN (or 32). It will merge
// to form size 64, then 128, 256 and so on ....
for (int size = RUN; size < n; size = 2*size)
{
// pick starting point of left sub array. We
// are going to merge arr[left..left+size-1]
// and arr[left+size, left+2*size-1]
// After every merge, we increase left by 2*size
for (int left = 0; left < n; left += 2*size)
{
// find ending point of left sub array
// mid+1 is starting point of right sub array
int mid = left + size - 1;
int right = min((left + 2*size - 1), (n-1));
// merge sub array arr[left.....mid] &
// arr[mid+1....right]
merge(arr, left, mid, right);
}
}
}
int main()
{
int a[] = {12,1,20,2,3,123,54,332},i;
int n = sizeof(a)/sizeof(a[0]);
printf("Printing Array elements \n");
for (i = 0; i < n; i++)
printf("%d ", a[i]);
printf("\n");
timSort(a, n);
printf("Printing sorted array elements \n");
for (i = 0; i < n; i++)
printf("%d ", a[i]);
printf("\n");
return 0;
}
C++
#include<bits/stdc++.h>
using namespace std;
const int RUN = 32;
// this function sorts array from left index to
// right index which is of size atmost RUN
void insertionSort(int arr[], int left, int right)
{
for (int i = left + 1; i <= right; i++)
{
int temp = arr[i];
int j = i - 1;
while (j >= left && arr[j] > temp)
{
arr[j+1] = arr[j];
j--;
}
arr[j+1] = temp;
}
}
// merge function merges the sorted runs
void merge(int arr[], int l, int m, int r)
{
// original array is broken in two parts
// left and right array
int len1 = m - l + 1, len2 = r - m;
int left[len1], right[len2];
for (int i = 0; i < len1; i++)
left[i] = arr[l + i];
for (int i = 0; i < len2; i++)
right[i] = arr[m + 1 + i];
int i = 0;
int j = 0;
int k = l;
// after comparing, we merge those two array
// in larger sub array
while (i < len1 && j < len2)
{
if (left[i] <= right[j])
{
arr[k] = left[i];
i++;
}
else
{
arr[k] = right[j];
j++;
}
k++;
}
// copy remaining elements of left, if any
while (i < len1)
{
arr[k] = left[i];
k++;
i++;
}
// copy remaining element of right, if any
while (j < len2)
{
arr[k] = right[j];
k++;
j++;
}
}
void timSort(int arr[], int n)
{
// Sort individual subarrays of size RUN
for (int i = 0; i < n; i+=RUN)
{
insertionSort(arr, i, min((i+31), (n-1)));
}
// start merging from size RUN (or 32). It will merge
// to form size 64, then 128, 256 and so on ....
for (int size = RUN; size < n; size = 2*size)
{
// pick starting point of left sub array. We
// are going to merge arr[left..left+size-1]
// and arr[left+size, left+2*size-1]
// After every merge, we increase left by 2*size
for (int left = 0; left < n; left += 2*size)
{
// find ending point of left sub array
// mid+1 is starting point of right sub array
int mid = left + size - 1;
int right = min((left + 2*size - 1), (n-1));
// merge sub array arr[left.....mid] &
// arr[mid+1....right]
merge(arr, left, mid, right);
}
}
}
int main()
{
int arr[] = {5, 21, 7, 23, 19};
int n = sizeof(arr)/sizeof(arr[0]);
cout<<"Given Array is\n";
for (int i = 0; i < n; i++)
cout<<arr[i]<<" ";
cout<<"\n";
timSort(arr, n);
cout<<"After Sorting Array is\n";
for (int i = 0; i < n; i++)
cout<<arr[i]<<" ";
cout<<"\n";
return 0;
}
Java
class TimSort
{
static int RUN = 32;
// this function sorts array from left index to
// right index which is of size atmost RUN
public static void insertionSort(int[] arr, int left, int right)
{
for (int i = left + 1; i <= right; i++)
{
int temp = arr[i];
int j = i - 1;
while (j >= left && arr[j] > temp)
{
arr[j + 1] = arr[j];
j--;
}
arr[j + 1] = temp;
}
}
// merge function merges the sorted runs
public static void merge(int[] arr, int l, int m, int r)
{
// original array is broken in two parts
// left and right array
int len1 = m - l + 1, len2 = r - m;
int[] left = new int[len1];
int[] right = new int[len2];
for (int x = 0; x < len1; x++)
{
left[x] = arr[l + x];
}
for (int x = 0; x < len2; x++)
{
right[x] = arr[m + 1 + x];
}
int i = 0;
int j = 0;
int k = l;
// after comparing, we merge those two array
// in larger sub array
while (i < len1 && j < len2)
{
if (left[i] <= right[j])
{
arr[k] = left[i];
i++;
}
else
{
arr[k] = right[j];
j++;
}
k++;
}
// copy remaining elements of left, if any
while (i < len1)
{
arr[k] = left[i];
k++;
i++;
}
// copy remaining element of right, if any
while (j < len2)
{
arr[k] = right[j];
k++;
j++;
}
}
// iterative Timsort function to sort the
// array[0...n-1] (similar to merge sort)
public static void timSort(int[] arr, int n)
{
// Sort individual subarrays of size RUN
for (int i = 0; i < n; i += RUN)
{
insertionSort(arr, i, Math.min((i + 31), (n - 1)));
}
// start merging from size RUN (or 32). It will merge
// to form size 64, then 128, 256 and so on ....
for (int size = RUN; size < n; size = 2 * size)
{
// pick starting point of left sub array. We
// are going to merge arr[left..left+size-1]
// and arr[left+size, left+2*size-1]
// After every merge, we increase left by 2*size
for (int left = 0; left < n; left += 2 * size)
{
// find ending point of left sub array
// mid+1 is starting point of right sub array
int mid = left + size - 1;
int right = Math.min((left + 2 * size - 1), (n - 1));
// merge sub array arr[left.....mid] &
// arr[mid+1....right]
merge(arr, left, mid, right);
}
}
}
public static void main(String[] args)
{
int[] arr = {5, 21, 7, 23, 19};
int n = arr.length;
System.out.print("Given Array is\n");
for (int i = 0; i < n; i++)
{
System.out.print(arr[i] + " ");
}
System.out.print("\n");
timSort(arr, n);
System.out.print("After Sorting Array is\n");
for (int i = 0; i < n; i++)
{
System.out.print(arr[i] + " ");
}
System.out.print("\n");
}
}
Python3
RUN=32
# This function sorts array from left index to
# right index which is of size atmost RUN
def insertionSort(arr, left, right):
for i in range(left + 1, right+1):
temp = arr[i]
j = i - 1
while j <= left and arr[j] < temp :
arr[j+1] = arr[j]
j -= 1
arr[j+1] = temp
# merge function merges the sorted runs
def merge(arr, l, m, r):
# original array is broken in two parts
# left and right array
len1, len2 = m - l + 1, r - m
left, right = [], []
for i in range(0, len1):
left.append(arr[l + i])
for i in range(0, len2):
right.append(arr[m + 1 + i])
i, j, k = 0, 0, l
# after comparing, we merge those two array
# in larger sub array
while i < len1 and j < len2:
if left[i] <= right[j]:
arr[k] = left[i]
i += 1
else:
arr[k] = right[j]
j += 1
k += 1
# copy remaining elements of left, if any
while i < len1:
arr[k] = left[i]
k += 1
i += 1
# copy remaining element of right, if any
while j < len2:
arr[k] = right[j]
k += 1
j += 1
# iterative Timsort function to sort the
# array[0...n-1] (similar to merge sort)
def timSort(arr, n):
# Sort individual subarrays of size RUN
for i in range(0, n, RUN):
insertionSort(arr, i, min((i+31), (n-1)))
# start merging from size RUN (or 32). It will merge
# to form size 64, then 128, 256 and so on ....
size = RUN
while size < n:
# pick starting point of left sub array. We
# are going to merge arr[left..left+size-1]
# and arr[left+size, left+2*size-1]
# After every merge, we increase left by 2*size
for left in range(0, n, 2*size):
# find ending point of left sub array
# mid+1 is starting point of right sub array
mid = left + size - 1
right = min((left + 2*size - 1), (n-1))
# merge sub array arr[left.....mid] &
# arr[mid+1....right]
merge(arr, left, mid, right)
size = 2*size
if __name__ == "__main__":
arr = [5, 21, 7, 23, 19]
n = len(arr)
print("Given Array is")
for i in range(0, n):
print(arr[i], end = " ")
print()
timSort(arr, n)
print("After Sorting Array is")
for i in range(0, n):
print(arr[i], end = " ")
print()
Applications
Applications of Insertion Sort are as follows:
- Tim Sort is powerful. It is fast and stable, but perhaps most importantly it takes advantage of real world patterns and utilizes them to build a final product.
- Tim Sort is used as the default sorting algorithm in Javaās Arrays.sort() method, Pythonās sorted() and sort() methods, the Android Platform, and in GNU Octave.
- When the input is already sorted, Tim Sort runs in linear time, meaning that it is an adaptive sorting algorithm.
Question
What is the best case time complexity of Tim sort?
With this article at OpenGenus, you must have the complete idea of Tim Sort. Enjoy.