Tim Sort


Reading time: 30 minutes | Coding time: 15 minutes

Tim Sort is a hybrid stable sorting algorithm that takes advantage of common patterns in data, and utilizes a combination of an improved Merge sort and Binary Insertion sort along with some internal logic to optimize the manipulation of large scale real-world data. Tim Sort was first implemented in 2002 by Tim Peters for use in Python.

Operation

Tim sort uses Binary insertion sort and improved merge sort by using galloping in a combination. Binary insertion sort is the best method to sort when data is already or partially sorted or the length of run is smaller than MIN_RUN and merge sort is best when the input is large.

Tim Sort is complex, even by algorithmic standards. The operation is best broken down into parts:

Run (dividing slot)

An input array is divided into different sub-arrays, count of elements inside a sub-array is defined as a RUN, the minimum value of such runs is a MIN_RUN which is a power of 2 not more than 32 (or 64). These sub-arrays are usually partially ordered (strictly ascending or stricly descending).When the array is decomposed into several runs, the runs of ascending order remain unchanged, and the runs of strictly descending order are reversed. Finally, several runs of ascending order are obtained.

Example:

Consider an array of goals:

1 5 9 8 6 4 5 6 7

We can see that [1, 5, 9] conforms to ascending order, [8, 6, 4] conforms to strict descending order, [4, 5, 6, 7] conforms to ascending order.

Flip the strictly descending run and then add together a new array:

1 5 9 6 8 4 5 6 7

Merge run

Let's assume that the new array is:

.... 6 7 8 9 10 1 2 3 4 ....

If we merge two runs using Mergesort alone
run1: [6, 7, 8, 9, 10]
run2: [1, 2, 3, 4, 5]

Mergesort compares the first element of the two runs
1 < 6, get 1
2 < 6, get 2
3 < 6, get 3
4 < 6, get 4
5 < 6, get 5

We found that traversing the entire run directly would eventually consume more number of merge operations if the run was longer. Because every run is in ascending order, there's no need to compare them one by one and the concept of Galloping becomes handy.

Galloping

For example, to merge the following two runs:
run1: [101, 102, 103, ... 200]
run2: [1, 2, 3, ..., 100]
Instead of comparing the elements one by one, we compare them by increasing powers of 2^n where n >= 0.

run1[0] > run2[0]
run1[0] > run2[1]
run1[0] > run2[3]
run1[0] > run2[7]
run1[0] > run2[15]
...
run1[0] > run2[2^n - 1]
run1[0] <= run2[2^(n+1) - 1]

We get a result run2 [2 ^ n - 1] < run1[0] <= run2 [2 ^ (n + 1) - 1], Since run2 is sorted, we use Binary Search to locate run1[0] in run2 very efficiently by defining left = run2[2^n - 1], right = run[2^(n+1) - 1]. Hence. the number of times we merge two runs is reduced from O(N) to O(logN).

Why not skip Galloping and just do Binary Search?

Let's give an example:
run1: [1, 3, 5, 7, 9 ... 2n+1]
run2: [0, 2, 4, 6, 8 ... 2n]
In this way, run1[0] is only larger than run2[0]. Each time we do Binary Search, we can only get a number of results, plus n times of Binary Search, so the time complexity changes from O(N) to O(NlogN).

Stack determines the order

When the original array becomes a set of various ascending runs, we need to merge two runs, but if we merge a long run with a shorter run, it will take a longer time to compare a comparitively shorter run. So Timsort maintains a stack with all the run lengths on the stack, while satisfying that the length of run on the back stack is longer than the sum of the length of run on the first two stacks, so that the length of run always decreases.

Stack : [... runA, runB, runC]
runA > runB + runC
runB > runC
This avoids run merges with too much difference in length.

For those who prefer bullets:

  • Establish a minrun size that is a power of 2 (usually 32, never more than 64 or your Binary Insertion Sort will lose efficiency)
  • Find a run in the first minrun of data.
  • If the run is not at least minrun in length, use Insertion Sort to grab subsequent or prior items and insert them into the run until it is the correct minimum size.
  • Repeat until the entire array is divided into sorted subsections.
  • Use the latter half of Merge Sort to join the ordered arrays.

Complexity Analysis

By design TimSort is well suited for partially sorted data with the best case being totally sorted data. It falls into the adaptive sort family. Taking the number of runs ρ as a (natural) parameter for a refined analysis we obtained:

TimSort runs in O(N + Nlogρ) time.

  • Worst case time complexity: Θ(NlogN)
  • Average case time complexity: Θ(NlogN)
  • Best case time complexity: Θ(N)
  • Space complexity: Θ(N)

Implementation

Implementation of Tim Sort algorithm:

  • C
  • C++
  • Java
  • Python

C

 
    #include<stdio.h>
    #define min(a,b) (a)<(b)?(a):(b)
    const int RUN = 32; 
    // this function sorts array from left index to 
    // right index which is of size atmost RUN 
    void insertionSort(int arr[], int left, int right) 
    { 
        for (int i = left + 1; i <= right; i++) 
        { 
            int temp = arr[i]; 
            int j = i - 1; 
            while (j >= left && arr[j] > temp) 
            { 
                arr[j+1] = arr[j]; 
                j--; 
            } 
            arr[j+1] = temp; 
        } 
    } 
    // merge function merges the sorted runs 
    void merge(int arr[], int l, int m, int r) 
    { 
        // original array is broken in two parts 
        // left and right array 
        int len1 = m - l + 1, len2 = r - m; 
        int left[len1], right[len2]; 
        for (int i = 0; i < len1; i++) 
            left[i] = arr[l + i]; 
        for (int i = 0; i < len2; i++) 
            right[i] = arr[m + 1 + i]; 
        int i = 0; 
        int j = 0; 
        int k = l; 
        // after comparing, we merge those two array 
        // in larger sub array 
        while (i < len1 && j < len2) 
        { 
            if (left[i] <= right[j]) 
            { 
                arr[k] = left[i]; 
                i++; 
            } 
            else
            { 
                arr[k] = right[j]; 
                j++; 
            } 
            k++; 
        } 
        // copy remaining elements of left, if any 
        while (i < len1) 
        { 
            arr[k] = left[i]; 
            k++; 
            i++; 
        } 
        // copy remaining element of right, if any 
        while (j < len2) 
        { 
            arr[k] = right[j]; 
            k++; 
            j++; 
        } 
    }
    void timSort(int arr[], int n) 
    { 
        // Sort individual subarrays of size RUN 
        for (int i = 0; i < n; i+=RUN)
        {
            insertionSort(arr, i, min((i+31), (n-1))); 
        }
        // start merging from size RUN (or 32). It will merge 
        // to form size 64, then 128, 256 and so on .... 
        for (int size = RUN; size < n; size = 2*size) 
        { 
            // pick starting point of left sub array. We 
            // are going to merge arr[left..left+size-1] 
            // and arr[left+size, left+2*size-1] 
            // After every merge, we increase left by 2*size 
            for (int left = 0; left < n; left += 2*size) 
            { 
                // find ending point of left sub array 
                // mid+1 is starting point of right sub array 
                int mid = left + size - 1; 
                int right = min((left + 2*size - 1), (n-1)); 
                // merge sub array arr[left.....mid] & 
                // arr[mid+1....right] 
                merge(arr, left, mid, right); 
            } 
        } 
    } 
    int main()  
    {  
        int a[] = {12,1,20,2,3,123,54,332},i;  
        int n = sizeof(a)/sizeof(a[0]);  
        printf("Printing Array elements \n");  
        for (i = 0; i < n; i++)  
        printf("%d  ", a[i]);  
        printf("\n");  
        timSort(a, n); 
        printf("Printing sorted array elements \n");  
        for (i = 0; i < n; i++)  
            printf("%d  ", a[i]);  
        printf("\n");  
        return 0;  
    }  
   

C++

 
    #include<bits/stdc++.h> 
    using namespace std; 
    const int RUN = 32; 
    // this function sorts array from left index to 
    // right index which is of size atmost RUN 
    void insertionSort(int arr[], int left, int right) 
    { 
        for (int i = left + 1; i <= right; i++) 
        { 
            int temp = arr[i]; 
            int j = i - 1; 
            while (j >= left && arr[j] > temp) 
            { 
                arr[j+1] = arr[j]; 
                j--; 
            } 
            arr[j+1] = temp; 
        } 
    } 
    // merge function merges the sorted runs 
    void merge(int arr[], int l, int m, int r) 
    { 
        // original array is broken in two parts 
        // left and right array 
        int len1 = m - l + 1, len2 = r - m; 
        int left[len1], right[len2]; 
        for (int i = 0; i < len1; i++) 
            left[i] = arr[l + i]; 
        for (int i = 0; i < len2; i++) 
            right[i] = arr[m + 1 + i]; 
        int i = 0; 
        int j = 0; 
        int k = l; 
        // after comparing, we merge those two array 
        // in larger sub array 
        while (i < len1 && j < len2) 
        { 
            if (left[i] <= right[j]) 
            { 
                arr[k] = left[i]; 
                i++; 
            } 
            else
            { 
                arr[k] = right[j]; 
                j++; 
            } 
            k++; 
        } 
        // copy remaining elements of left, if any 
        while (i < len1) 
        { 
            arr[k] = left[i]; 
            k++; 
            i++; 
        } 
        // copy remaining element of right, if any 
        while (j < len2) 
        { 
            arr[k] = right[j]; 
            k++; 
            j++; 
        } 
    }
    void timSort(int arr[], int n) 
    { 
        // Sort individual subarrays of size RUN 
        for (int i = 0; i < n; i+=RUN)
        {
            insertionSort(arr, i, min((i+31), (n-1))); 
        }
        // start merging from size RUN (or 32). It will merge 
        // to form size 64, then 128, 256 and so on .... 
        for (int size = RUN; size < n; size = 2*size) 
        { 
            // pick starting point of left sub array. We 
            // are going to merge arr[left..left+size-1] 
            // and arr[left+size, left+2*size-1] 
            // After every merge, we increase left by 2*size 
            for (int left = 0; left < n; left += 2*size) 
            { 
                // find ending point of left sub array 
                // mid+1 is starting point of right sub array 
                int mid = left + size - 1; 
                int right = min((left + 2*size - 1), (n-1)); 
                // merge sub array arr[left.....mid] & 
                // arr[mid+1....right] 
                merge(arr, left, mid, right); 
            } 
        } 
    } 
    int main() 
    { 
        int arr[] = {5, 21, 7, 23, 19}; 
        int n = sizeof(arr)/sizeof(arr[0]); 
        cout<<"Given Array is\n"; 
        for (int i = 0; i < n; i++) 
            cout<<arr[i]<<"  ";
        cout<<"\n"; 
        timSort(arr, n); 
        cout<<"After Sorting Array is\n"; 
        for (int i = 0; i < n; i++) 
            cout<<arr[i]<<"  "; 
        cout<<"\n"; 
        return 0; 
    }
   

Java


    class TimSort  
    { 
        static int RUN = 32;
        // this function sorts array from left index to  
        // right index which is of size atmost RUN  
        public static void insertionSort(int[] arr, int left, int right)  
        { 
            for (int i = left + 1; i <= right; i++)  
            { 
                int temp = arr[i]; 
                int j = i - 1; 
                while (j >= left && arr[j] > temp) 
                { 
                    arr[j + 1] = arr[j]; 
                    j--; 
                } 
                arr[j + 1] = temp; 
            } 
        } 
        // merge function merges the sorted runs  
        public static void merge(int[] arr, int l, int m, int r) 
        { 
            // original array is broken in two parts  
            // left and right array  
            int len1 = m - l + 1, len2 = r - m; 
            int[] left = new int[len1]; 
            int[] right = new int[len2]; 
            for (int x = 0; x < len1; x++)  
            { 
                left[x] = arr[l + x]; 
            } 
            for (int x = 0; x < len2; x++)  
            { 
                right[x] = arr[m + 1 + x]; 
            } 
            int i = 0; 
            int j = 0; 
            int k = l; 
            // after comparing, we merge those two array  
            // in larger sub array  
            while (i < len1 && j < len2)  
            { 
                if (left[i] <= right[j])  
                { 
                    arr[k] = left[i]; 
                    i++; 
                } 
                else 
                { 
                    arr[k] = right[j]; 
                    j++; 
                } 
                k++; 
            } 
            // copy remaining elements of left, if any  
            while (i < len1) 
            { 
                arr[k] = left[i]; 
                k++; 
                i++; 
            } 
            // copy remaining element of right, if any  
            while (j < len2)  
            { 
                arr[k] = right[j]; 
                k++; 
                j++; 
            } 
        } 
        // iterative Timsort function to sort the  
        // array[0...n-1] (similar to merge sort)  
        public static void timSort(int[] arr, int n)  
        { 
            // Sort individual subarrays of size RUN  
            for (int i = 0; i < n; i += RUN)  
            { 
                insertionSort(arr, i, Math.min((i + 31), (n - 1))); 
            } 
            // start merging from size RUN (or 32). It will merge  
            // to form size 64, then 128, 256 and so on ....  
            for (int size = RUN; size < n; size = 2 * size)  
            { 
                // pick starting point of left sub array. We  
                // are going to merge arr[left..left+size-1]  
                // and arr[left+size, left+2*size-1]  
                // After every merge, we increase left by 2*size  
                for (int left = 0; left < n; left += 2 * size)  
                { 
                    // find ending point of left sub array  
                    // mid+1 is starting point of right sub array  
                    int mid = left + size - 1; 
                    int right = Math.min((left + 2 * size - 1), (n - 1));
                    // merge sub array arr[left.....mid] &  
                    // arr[mid+1....right]  
                    merge(arr, left, mid, right); 
                } 
            } 
        }    
        public static void main(String[] args)  
        { 
            int[] arr = {5, 21, 7, 23, 19}; 
            int n = arr.length; 
            System.out.print("Given Array is\n"); 
            for (int i = 0; i < n; i++) 
            { 
                System.out.print(arr[i] + " "); 
            } 
            System.out.print("\n");
            timSort(arr, n);
            System.out.print("After Sorting Array is\n"); 
            for (int i = 0; i < n; i++) 
            { 
                System.out.print(arr[i] + " "); 
            } 
            System.out.print("\n"); 
        } 
   } 
   

Python3


RUN=32
# This function sorts array from left index to  
# right index which is of size atmost RUN  
def insertionSort(arr, left, right):
for i in range(left + 1, right+1): 
    temp = arr[i]  
    j = i - 1 
    while j <= left and arr[j] < temp :
        arr[j+1] = arr[j]  
        j -= 1
    arr[j+1] = temp
# merge function merges the sorted runs  
def merge(arr, l, m, r):
# original array is broken in two parts  
# left and right array  
len1, len2 =  m - l + 1, r - m  
left, right = [], []  
for i in range(0, len1):  
    left.append(arr[l + i])  
for i in range(0, len2):  
    right.append(arr[m + 1 + i]) 
i, j, k = 0, 0, l 
# after comparing, we merge those two array  
# in larger sub array  
while i < len1 and j < len2:  
    if left[i] <= right[j]:  
        arr[k] = left[i]  
        i += 1 
    else: 
        arr[k] = right[j]  
        j += 1 
    k += 1
# copy remaining elements of left, if any  
while i < len1: 
    arr[k] = left[i]  
    k += 1 
    i += 1
# copy remaining element of right, if any  
while j < len2:  
    arr[k] = right[j]  
    k += 1
    j += 1
# iterative Timsort function to sort the  
# array[0...n-1] (similar to merge sort)  
def timSort(arr, n):
# Sort individual subarrays of size RUN  
for i in range(0, n, RUN):  
    insertionSort(arr, i, min((i+31), (n-1)))
# start merging from size RUN (or 32). It will merge  
# to form size 64, then 128, 256 and so on ....  
size = RUN 
while size < n:
    # pick starting point of left sub array. We  
    # are going to merge arr[left..left+size-1]  
    # and arr[left+size, left+2*size-1]  
    # After every merge, we increase left by 2*size  
    for left in range(0, n, 2*size):
        # find ending point of left sub array  
        # mid+1 is starting point of right sub array  
        mid = left + size - 1 
        right = min((left + 2*size - 1), (n-1))
        # merge sub array arr[left.....mid] &  
        # arr[mid+1....right]  
        merge(arr, left, mid, right)
    size = 2*size     
if __name__ == "__main__":
arr = [5, 21, 7, 23, 19]  
n = len(arr)  
print("Given Array is")  
for i in range(0, n):  
    print(arr[i], end = " ")  
print()  
timSort(arr, n)
print("After Sorting Array is")  
for i in range(0, n):  
    print(arr[i], end = " ")  
print()
   

Applications

Applications of Insertion Sort are as follows:

  • Tim Sort is powerful. It is fast and stable, but perhaps most importantly it takes advantage of real world patterns and utilizes them to build a final product.
  • Tim Sort is used as the default sorting algorithm in Java’s Arrays.sort() method, Python’s sorted() and sort() methods, the Android Platform, and in GNU Octave.
  • When the input is already sorted, Tim Sort runs in linear time, meaning that it is an adaptive sorting algorithm.

Question

What is the best case time complexity of Tim sort?

O(n)
O(n log n)
O(n2)
O(log n)
Best case time complexity of Tim sort occurs when the input array is already sorted. In such a case only one run will be required.

With this article at OpenGenus, you must have the complete idea of Tim Sort. Enjoy.