List of Randomized Algorithms
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
In this article, we have listed several important Randomized Algorithms such as Fisher Yates shuffle, Minimum Cut with Karger's, Matrix Product Verification and many more.
Table of Contents
- Introduction
- Las Vegas
- Randomized Quicksort
- Monte Carlo
- Pi Approximation
- Minimum Cut with Karger's
- Matrix Product Verification
- Atlantic City
- Other randomized algorithms
- Fisher-Yates shuffle
- Reservoir Sampling
- Testing for prime numbers
- Applications
- Overview
Prerequisite:
- Randomized Algorithms
- Fisher-Yates shuffle
- Reservoir Sampling
- Karger's algorithm to find Minimum Cut
Introduction
Randomized algorithms are algorithms which use randomness in its logic. They can be implemented in order to reduce running time and space complexity. Some randomized algorithms still have deterministic run time (meaning the time complexity is specified) or others where their time complexity is reliant on a random variable. These can be defined as broadly as Las Vegas or Monte Carlo algorithms.
Las Vegas Algorithms
As stated above, these randomized algorithms have a specified time complexity however, they still produce generally faster than non-random versions of the same algorithm. A Las Vegas algorithm works as such:
- Given running time
- If the algorithm completes within that run time, it is correct
- If the algorithm can't complete within the time, no solutions are found
Due to the nature of this, it means that Las Vegas Algorithms can always guarantee a worst case scenario time complexity however still can't give a time constraint as the running time still varies with input.
Example: Random Quicksort
The quicksort algorithm is a sorting algorithm which uses no extra memory. It uses the basis of divide and conquer to sort. We can say that the algorithm operates as such:
- Chose an element (i) of pivot using a random number generator
- Create two sub-arrays of smaller than i and greater than i
- Then performs quicksort on the sub-arrays, outputting the full sorted one
import random as rand
def quicksort(arr, first, last):
if (first < last):
pivotidx = random_partition(arr, first, last)
quicksort(arr, first, pivotidx - 1)
quicksort(arr, pivotidx + 1, last)
def random_partition(arr, first, last):
random_pivot = rand.randrange(first, last)
arr[first], arr[random_pivot] = arr[random_pivot], arr[first]
return partition(arr, first, last)
def partition(arr, first, last):
pivot = first
i = first + 1
for j in range(first + 1, last + 1):
if arr[j] <= arr[pivot]:
arr[i], arr[j] = arr[j], arr[i]
i += 1
arr[pivot], arr[i - 1] = arr[i - 1], arr[pivot]
pivot = i - 1
return(pivot)
array=[6,8,4,8,2,3,5,10]
quicksort(array, 0, len(array)-1)
print(array)
In the worst case scenario, the pivot element will be the first or last item. We know that insertion/removal takes O(1) and the partitioning takes O(n). Since if the pivot is first and last the partition will create 2 sub-arrays of size 0 (since there are no elements past the last or before the first), and one of size O(n-1). We can therefore deduce the running time as a pairwise comparison and therefore will be left with a running time of O(n^2).
However, if the random pivot is not the first or last element in the array, we reduce our time complexity, if the middle elements are selected as the pivot, the array will be partitioned in half each time, leaving us with a time complexity of O(nlogn) since the recursion tree is O(logn) deep, and each level takes O(n) to complete. This also means that the average time complexity will remain at O(nlogn) even though there is a 1/n chance that this average case occurs since the depth of the recursion tree and time it takes to complete one level of the tree doesn't change.
Monte Carlo Algorithms
Another common type of randomized algorithm is Monte Carlo. These algorithms are probability based, depending on input, it has the probability to produce an incorrect or invalid result. Generally speaking, we can say that Monte Carlo algorithms are not always right but are faster than Las Vegas, where as Vegas algorithms tend to be slower than Monte Carlo however always produce a correct result. Therefore we can say that a strong Monte Carlo algorithm, with a high success rate, is generally better to implement than Las Vegas since its running time is shorter, and it can be repeated if a solution is not first found.
Example: π Approximation
Pi approximation problem is a common interview question for developer jobs which implements a Monte Carlo style approach.
We know that Pi is the ratio of a cirlce's circumference to its diameter. To successfully approximate it, we can take the following steps:
- Use a box with the x and y axis of -1 to 1
- Generate random points on the graph
- Count the number of points which are within a distance of 1 from the origin (X), and the points which don't fall into this category (Y).
- Since the ratio of a circle to the area of the entire graph is equal to the ratio inside a distance of 1 to those outside we can say:
πr^2/4r^2 ≈ X/Y
This also means that the more random points that are generated, the more accurate our approximation is.
Example: Karger's algorithm for minimum cut
Another instance where Monte Carlo algorithms can be implemented is the use of Karger's algorithm when finding the minimum cut of a graph (meaning the small;est number of edges which splits the graph into two parts). This works for undirected and unweighted graphs.
- Create a contracted copy of the graph
- Condition controlled loop while there are more than 2 vertices
- Select random edge (u, v) in the copy graph
- Merge the two vertices in the edge into a single vertex
- Remove loops
- Return the cut with two vertices
Since Karger's algorithm is a Monte Carlo algorithm, the minimum cut isn't always found, the probability of the algorithm finding the min cut is >= 1/n^2. If there is a unique min cut in the graph, and there are M edges in the min cut with E total edges. Karger's algorithm would find the min cut successfully only if none of the edges En-Em are removed during the iterations.
M: edges in min cut
E: total edges
V: vertices
Event 1: An edge from En-Em is chosen in the first iteration
Event 2: An edge from En-Em is chosen in the second iteration
Event 3: An edge from En-Em is chosen in the third iteration
etc.
The min cut would be successfully found if none of those events occur therefore the probability is defined as P(Total Events').
This algorithm can be applied to many situations:
- To strengthen a network
- Network optimisation
- Cluster and matching problems
Example: Freivald's algorithm for matrix product verification
Another problem which can be efficiently solved with a Monte Carlo approach is matrix product checking.
Take 3 matrices A, B and C. We can use Freivald's algorithm to check if C is a product of A and B.
A = 2 2
2 2
B = 2 2
2 2
C = 4 4
4 4
Output: False
A = 2 2
2 2
B = 2 2
2 2
C = 8 8
8 8
Output: True
We can do this with a simple approach, finding the product of A and B, and seeing if it matches C. The algorithm has a chance to run in O(n^log7) time using Stression matrix multiplication. Freivald's algorithm can be used (which runs in O(n^2) with a high chance of success to verify a product in O(kn^2) time. It is a high chance of success since the probability of failure is less than 2^-k.
The general approach to this is:
- Generate a random vector 0,1 of n*1, v
- Calculate the product of A and Bv and subtract it from Cv
- If the result is 0 then return true, else false
This method works since we know that if C is the product, then subtracting it from the product of A and B will equal to 0.
Our example from above can be implemented in Python as the following:
import random as rand
N = 2
def freivald(a, b, c):
v = [0] * N
for i in range (0, N):
v[i] = int(rand.randrange(509090009) % 2)
bv = [0] * N
for i in range(0, N):
for j in range(0, N):
bv[i] += b[i][j] * v[j]
cv = [0] * N
for i in range(0, N):
for j in range(0, N):
cv[i] += c[i][j] * v[j]
product = [0] * N
for i in range(0, N):
for j in range(0, N):
product[i] += a[i][j] * bv[j]
for i in range(0, N):
if (product[i] - cv[i] != 0):
return False
return True
def iters(a, b, c, itrs):
for i in range(0, itrs):
if (freivald(a, b, c) == False):
return False
return True
a = [[2,2],[2,2]]
b = [[2,2],[2,2]]
c = [[8,8],[8,8]]
itrs = 10
print(iters(a, b, c, itrs))
Atlantic City Algorithms
These algorithms are a mix between Las Vegas and Monte Carlo. It is mostly fast and is mostly correct. Unfortunately, there are a little amount of cases where this style is implemented due to it being a difficult process.
More Randomized algorithms
Fisher-Yates shuffle
Another instance where random algorithms can be implemented is the shuffling of an array. This means we shall output a random permutation of the given array, each with an equal probability of being output. We can achieve this by using the Fisher-Yates shuffling algorithm, which operates in O(n) time, we can then use a pseudo random number generator, which works in O(1) time. The algorithm works by starting at the last index of the array, and randomly swapping it with another element in the array. We then iterate through this, except each time decreasing the size of the array looked at by 1. This will end at the first index, where we will be given a random permutation of the input.
import random as rand
def fisheryates_shuffle(arr, n):
for i in range(n - 1, 0, -1):
j = rand.randint(0, i + 1)
arr[i], arr[j] = arr[j], arr[i]
return arr
arr = [54,34,7,5,78,6,2,4,2]
n = len(arr)
print(fisheryates_shuffle(arr, n))
Reservoir Sampling
An adaption of a Fisher-Yates shuffle is reservoir sampling. This is when we want to be able to randomly sample k items from a stream which is n in length. The stream is usually too big to fit into main memory, so therefore it is not entirely known to the algorithm, it only being able to see the part of the stream its currently working on, additionally, it cannot see the previous items in the stream. We want to be able to randomly sample at size k at any time.
Using variable probability we can do this by maintaining a reservoir of size k, whilst replacing the current elements in the reservoir with a probability based on the amount of elements in the stream.
If k = 1, then we could implement this as follows:
- Store first element as the reservoir element
- For every element (i), replace the reservoir item with it, with a probability of 1/i
Since we want to be able to sample any number of elements (k) from our stream, we can adapt the algorithm to:
- Store first input from stream (k) as the reservoir elements
- For every element (i), replace the reservoir with it, with a probability of k/i
Reservoir sampling is used in a multitude of different applications, including web servers (where there is possibly an infinite stream) and databases (for testing optimal queries).
Testing for prime numbers
Primality testing is another area where randomization is widely used. Fermat's Little Theorem is defined as:
If n is prime, then for every a, 1 < a < n-1,
a^n-1 = 1 mod n
a^n-1 mod n = 1
Since this is a randomized algorithm, there are cases that a non-prime will return true, as seen in previous examples, these errors can be reduced by increasing iterations.
Our algorithm can be as follows:
K is directly proportianal with probability of correct answers for non-prime numbers.
Prime numbers will always be identified correctly
- Iterate through k times:
- Random selection between [2, n -2]
- If greatest common factor of a and n != 1, return false
- If a^n-1 is not equivilent to 1 mod n, return false
- Else, return true (meaning the number is likely prime)
The algorithm operates in O(klogn) time, since the main function takes O(logn) time, and we are repeating it k times. If our algorithm fails to identify primes properly, we can increase k to improve accuracy, however this comes at the downside of a higher running time.
import random as rand
def find_power(a, n, p):
result = 1
a = a % p
while n > 0:
if n % 2:
result = (result * a) % p
n -= 1
else:
a = (a ** 2) % p
n = n // 2
return result % p
def check_prime(n, k):
if n == 1 or n == 4:
return False
if n == 2 or n == 3:
return False
else:
for i in range(k):
a = rand.randint(2, n - 2)
if find_power(a, n - 1, n) != 1:
return False
return True
print(check_prime(19, 10))
print(check_prime(15, 10))
Our outputs will be True for 19, since it is prime and False for 15, since it is not.
Applications
Randomized algorithms are useful for many scenarios, we have explained a few above, some others include:
- Sorting
- Load Balancing
- Mathematical programming
- Graphing algorithms
- Enumeration and counting
- Derandomization
- Parallel Computing
- Data Structures
- Algebraic Identities
Overview
Randomized algorithms are commonly used to solve worst case complexity issues since a randomized algorithm with a high probability of succeeding each instance is much faster for large data sets than any deterministic approach of the same problem, they also are generally simpler to implement. In this article we have defined the principles of a randomized algorithm, described each type, gave explanations for various algorithms that use randomness and gave other applications where random algorithms are used.
Thanks for reading!
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.