In this article, we have explored how to implement Bubble Sort in parallel on multiple threads / cores. We have presented two approaches: one is slower than the sequential approach but the other approach is significantly faster than all other approaches for large input.
Table of contents:
 Introduction to Parallel Bubble Sort
 Parallel Bubble Sort
 Sequential Bubble Sort Algorithm
 Parallel Bubble Sort using Threads
 Alternate Parallel bubble sort using threads (with faster runtime)
 Comparison of 3 Bubble Sort implementations
 Time and Space Complexity
Prerequisites:
Introduction to Parallel Bubble Sort
Parallel bubble sort is a parallel implementation of the classic Bubble sort algorithm. The concept of parallelism involves executing a set of instructions/code simultaneously instead of line by line sequentially (the traditional way the code is executed in most machines and computer programming generally).
The main benefit of this is faster computation. Implementation of parallelism depends on some factors. For example, if there are parts of code that can be implemented independently and executed irrespective of any order, then we can execute at the same time. However, some parts of the program may still require other parts of the program to run first.
Therefore, the degree of parallelism depends on the program and its requirements. Additionally, the capacity of todayâ€™s modern processors that run these sequential programs is limited. We can solve this problem using multicore computers. In addition, even sequential programs involve some degree of parallelism.
Modernday processors can handle multiple instructions and the data generated from them at the same time. The hardware and the system are designed in such a way so that the data generated, and the previous, current, and future instructions donâ€™t interrupt each other.
This way, even though the programs are sequential, we can implement some degree of parallelism. We can also design our algorithms in such a way that the parts of the program run sequentially.
In parallel sorting, we transform the already existing traditional sorting algorithms in a way such that during execution, instead of sequential execution of algorithmâ€™s instructions.
Parallel Bubble Sort
In Parallel Bubble Sort, we divide sorting of the unsorted into two phases odd and even. We compare all pairs of elements in the list/array side by side (parallel).
We divide the sorting process into two phases odd phase and even phase. When it is the odd phase, we compare the element at index 0 with the element at index 1, the element at index 2 with the element at index 3, and so on. In the even phase, we compare index 1 element with index 2 element, index 3 element with index 4 element, and so on, with our last comparison being element at third last index and element at second last index.
When comparing, just like in normal bubble sort, we swap the elements, if the initial element is greater than the next element in comparison.
Both phases occur one after the other and the algorithm stops once no swap occurs in both odd and even phase.
The process of parallel bubble sort is shown below.
In the case of multicore processors, both phases can occur simultaneously, which is known as parallel implementation.
Sequential Bubble Sort Algorithm
To implement a parallel version of bubble sort, follow the steps mentioned below
 Create a variable called
Sorted
to keep track of whether swaps occur or not. This way, we can sort the list and stop the algorithm if no swaps occur. If no swaps occur in either phase, it means the list is already sorted. Then create a variable to store the length of the list, so that we can traverse and compare all elements in the list in pairs.  Using a
while
loop, we compare all elements in the list as pairs and sort them in phases. If the starting condition isSorted == 0
, this implies that the list is not yet sorted. Then we setSorted
equal to 1 to imply that no swaps occurred and the list is already sorted. This way, in the next step, if swaps occur, then we continue inside the loop, otherwise, we break out of it.  Using a
for
loop, we start with our odd phase. We compare two elements at a time ( by incrementing by 2 at each iteration inside thefor
loop) till the end of the list. If the current element is greater than the next element, we swap them. Then, we set theSorted
variable equal to 0 to imply that a swap occurred. Likewise, we repeat the same process in the nextfor
loop of the even phase, this time starting from index 1 till the end of the list.  Using the
print
function, we print the sorted version of the given list in the end.
Implementation
The following code executes the parallel bubble sort algorithm discussed above.
import time
import random
#timer to keep track of performance
start = time.perf_counter()
# a function to implement bubble sort in parallel
def Parallel_bubble_sort(lst):
# variable to keep track of swaps to end the while loop
Sorted = 0
# variable to get length of list
n = len(lst)
#loop to traverse all list elements in phases
while Sorted == 0:
# set to 1 initially to assume list is sorted
# and no swaps occurred
Sorted = 1
# traverse all list elements in pair
# start at index 0 for odd phase
# start at index 1 for even phase
for i in range(0, n1, 2):
# check if current element greater than next element
if lst[i] > lst[i+1]:
# if so, swap the elements
lst[i], lst[i+1] = lst[i+1], lst[i]
# set to 0 to imply a swap occurred
Sorted = 0
for i in range(1, n1, 2):
if lst[i] > lst[i+1]:
lst[i], lst[i+1] = lst[i+1], lst[i]
Sorted = 0
# print final sorted list
print(lst)
# an example list to test above program
lst = [(random.randint(0,100)) for i in range(100)]
Parallel_Bubble_Sort(lst)
finish = time.perf_counter()
print(f'Finished in {round(finishstart,2)} second(s)')
The above program produces the following output
[0, 0, 2, 3, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 8, 8, 9, 10, 10, 11, 13, 15, 16, 18, 20, 28, 29, 30, 30, 33, 33, 34, 36, 40, 41, 41, 42, 42, 42, 43, 44, 44, 45, 47, 48, 48, 49, 51, 52, 52, 54, 55, 55, 56, 56, 56, 57, 57, 58, 58, 61, 63, 67, 67, 68, 69, 69, 71, 71, 71, 71, 72, 72, 74, 76, 77, 77, 77, 78, 79, 79, 79, 80, 81, 84, 84, 85, 85, 90, 92, 94, 94, 95, 95, 98, 99, 99, 100, 100, 100]
Finished in 0.01 second(s)
Parallel Bubble Sort using Threads
Using Python's threading module, we can actually implement our algorithm in parallel. This module allows us to create threads, where each thread is considered as a single process that run within our program. We can think of threads of tasks or a set of instructions that make our program. Usually, all tasks in our program are implemented sequentially. However, using threads, we can run these tasks concurrently. This way, our program can run much faster. However, depending on the computer's memory and processor core, creating threads sometimes creates inefficiency as well.
To implement the bubble sort algorithm using threads, follow the steps mentioned below
 Import module
threading
so that we can use threads in our program. We also importtime
andrandom
modules, to keep track of our program's running time and to generate a random list in the end, to test the program.  As mentioned in previously, we sort the list in two phases, odd and even. So, we create a function called
odd_phase
, which we will use to perform sorting in phases. This function takes in two values, the list and its length. Then we compare two elements at a time in the list and swap them in order. Likewise, we create a function calledeven_phase
.  As our last function, we create the final
Parallel_Bubble_Sort
function. In this function, we again create a variable to get the lenght of the list. We then create a an empty list calledthreads
that we use to keep track of all threads we start, so that later we can stop them. Then using afor
loop, we go through the entire list and sort the list alternatively, usingodd_phase
andeven_phase
functions. We run both of these functions using threads, so that they run parallelly. Using a variable calledt1
, we create a thread and as our target, we set theodd_phase
function ifi
id odd. We then pass the list and its length, and later start the thread. After starting the thread, we append it to ourthreads
list. Then using anotherfor
loop, we close the thread using thejoin
function. Likewise, we do the same process in casei
is even.  Lastly, we print out the sorted list.
import time
import random
import threading
start = time.perf_counter()
# create function to sort elements in odd phase
def odd_phase(lst,n):
for i in range(0, n1, 2):
if lst[i] > lst[i+1]:
lst[i], lst[i+1] = lst[i+1], lst[i]
# create function to sort elements in even phase
def even_phase(lst,n):
for i in range(1, n1, 2):
if lst[i] > lst[i+1]:
lst[i], lst[i+1] = lst[i+1], lst[i]
# create function to for parallel bubble sort
def Parallel_Bubble_Sort(lst):
n = len(lst)
# create list to keep track of all threads
threads = []
for i in range(1,n+1):
# if i is odd, call for odd_phase function
if (i % 2 == 1):
# start threads one by one
t1 = threading.Thread(target=odd_phase,args = [lst,n])
t1.start()
threads.append(t1)
# close all threads
for thread in threads:
thread.join()
# if i is even, call for even_phase function
else:
t2 = threading.Thread(target=even_phase,args = [lst,n])
t2.start()
threads.append(t2)
for thread in threads:
thread.join()
# print final sorted list
print(lst)
# example list to test above program
lst = [(random.randint(0,100)) for i in range(100)]
Parallel_Bubble_Sort(lst)
finish = time.perf_counter()
print(f'Finished in {round(finishstart,2)} second(s)')
The above program produces the following output.
[0, 5, 6, 7, 8, 9, 9, 10, 10, 11, 14, 14, 16, 16, 17, 19, 21, 21, 24, 25, 25, 26, 26, 28, 28, 29, 29, 30, 31, 33, 33, 34, 34, 35, 35, 37, 39, 39, 40, 41, 41, 43, 45, 47, 48, 49, 52, 52, 52, 53, 54, 55, 56, 56, 57, 57, 58, 59, 60, 61, 63, 66, 67, 68, 68, 71, 72, 73, 73, 76, 76, 77, 80, 82, 84, 84, 85, 85, 87, 88, 89, 89, 90, 90, 91, 91, 93, 94, 94, 94, 95, 95, 96, 96, 97, 99, 99, 99, 100, 100]
Finished in 0.05 second(s)
This approach with threads takes more time because the overheads of creating threads are more. Since in each phase, odd or even, we only compare two elements at a time till the end of the list, and then separately create a thread, the runtime increases. Increase time happens because of starting a new thread and stopping the said thread each time. The costs of creating such threads, on top of comparing two elements, lead to increased runtime. This algorithm requires some calculations and is CPU bound. This means that greater the calculations, the more CPU resources, both in terms of software and hardware, are required.
Alternate Parallel bubble sort using threads (with faster runtime)
Another way we can bubble sort a list using threads is by dividing the list into smaller sublists. The number of sublists we create depends on the number of cores in the system. Each core can ideally run two threads. So in a dualcore system, we can use four threads.
We create sublists based on the biggest item in the list and the number of threads. Each sublist acts like a class interval. For example, if a list has 100 elements ranging from 0 to 100 and we use four threads, we would create four sublists. The first sublist would contain numbers ranging from 025, the second sublist would have numbers 2650, the third would have numbers 5175, and the last sublist would contain numbers ranging from 76100.
Then we create a thread for each sublist and run bubble sort parallelly in each sublist. Finally, we merge all sublists into a sorted list.
To implement this algorithm, follow the steps mentioned below
 We import the
threading
module to create threads. In addition, we also import theitertools
module, which we use to merge all sublists into a single list at the end of our program. The other two modules,time
andrandom
, are used to keep track of the program runtime and generate a list with random numbers in the end to test our program.  We then create a thread lock. This helps us to synchronize our usage of threads. Since we are sorting, this makes sure the order of the threads is maintained. It also ensures our list items don't jumble after thread activation, and during calculation in each thread.
 Next, we create a general bubble sort function, where we first acquire the lock. This helps the thread to acquire the lock and makes sure that access to this lock is locked until the lock is released. The rest of the function for bubble sort is the same as a usual one. We first get the length of the list. Then we go through the list swapping all elements in order till the entire list is sorted. After that, we release the lock, so that other threads can then use the lock.
 Finally, we create a separate
Parallel_bubble_sort
function. We first get the biggest element in the list and then set the number of threads to use for sorting parallelly. Then we create separate sublists as per the number of threads. Using the biggest element in the list and the number of threads, we calculate a number that helps us decide the distribution of list items into different sublists. This is called thesplit_factor
in the program.  Within the same function, using a
for
loop, we add list items to each sublist. Every subsequent sublist contains elements that are greater than the elements in the previous sublist. After adding list items into sublists, we remove those items from the main list, to avoid repetition. Finally, if any more items are left in the list, we add those to the last sublist.  Later, we handle the threading part of our
Parallel_bubble_sort
. To activate the thread, we use thethread
method. Using afor
loop, we pass in as parameters, thebubblesort
function, and each sublist one by one. Then we start the thread using thestart
method. We also create a list of active threads to keep track of them. After activating the thread, we append it to theactive_threads
list, so that later we can stop them. After activating all the threads, we close them using anotherfor
loop using thejoin
method.  We merge all the sublists using the
itertools
module and thechain
method. Since thechain
method returns an object, we cannot directly print it. So, we convert it into a list in the end.
# Alternate Threading Version
import time
import random
import threading
import itertools
# start time counter to calculate runtime
start_time = time.time()
# create a threading lock
lock = threading.Lock()
# normal bubble sort function
def bubblesort(lst):
# acquire a lock for thread
lock.acquire()
# get length of list
n = len(lst)
# perform bubble sort
for i in range(n):
swap = False
for j in range(0, ni1):
if lst[j]>lst[j+1]:
lst[j], lst[j+1] = lst[j+1], lst[j]
swap = True
if swap == False:
break
# release the lock after calculation
lock.release()
# create parallel bubble sort function
# this function uses normal bubble sort function
def Parallel_bubble_sort(lst):
# get biggest element in the list
biggest_item = max(lst)
# set number of threads
# as per the number of cores
# each core can run 2 threads ideally
no_threads = 4
# create sublists as per number of threads
lists = [[] for _ in range(no_threads)]
# we use a number to divide the list
# into class intervals for each sublist
split_factor = biggest_item//no_threads
# split list into sublists
# as per no of threads
for j in range(1,len(lists)):
for i in lst:
if i <= (split_factor*j):
lists[j1].append(i)
#remove the element from list
# after adding to sublist
# to prevent duplication
lst = [x for x in lst if x != i]
# include the remaining elements in list
# in the last sublist
lists[1] = lst
# start all threads for each sublist
active_threads = []
for list_item in lists:
t = threading.Thread(target=bubblesort, args=(list_item,))
t.start()
active_threads.append(t)
# stop all active threads
for thread in active_threads:
thread.join()
# merge all lists
# into final list
final_lst = itertools.chain(*lists)
final_lst = list(final_lst)
# an example list with 10k elements to test
lst = [(random.randint(0,10000)) for i in range(10000)]
Parallel_bubble_sort(lst)
end_time = time.time()  start_time
print(" Sorted in %s seconds " % (end_time))
We test the program with a list of length equal to 10,000 elements. Using this approach, we can sort a list of over 10,000 elements in around seven to eight seconds.
If the length of the list is small, this approach does run slower as compared to the previously mentioned approaches. However, once the length of the list becomes reasonably large, this approach fares much better in runtime comparatively.
Comparison of 3 Bubble Sort implementations
The image below shows the runtime of all approaches mentioned so far in different scenarios
Time and Space Complexity

Time: Each iteration involves an odd and even phase. Combining the number of comparisons in both phases, we get n1 comparisons for each iteration. So, the time complexity, considering we have multiple processors to perform each iteration, the original time complexity of O(n^2) becomes O(n).

Processor: In the case of multiprocessors, both odd and even phase comparisons occur simultaneously. Both phases at max require n/2 processors, so total processors required become n/2 + n/2 = n processors, where n stands for the number of elements in the list. The alternate threading approach used for parallel bubble sort in the end makes use of all the available processors on the system.

Space: Since bubble sort is an inplace algorithm, that property stays the same even in parallel implementation. So without extra space use, the space requirement stays constant O(1). In the case of the alternate parallel bubble sort approach using threads, space requirement depends on the number of threads used.
With this article at OpenGenus, you must have the complete idea of Parallel Bubble Sort.