Find k-th smallest element using QuickSelect algorithm

Binary Tree Problems books

Get FREE domain for 1st year and build your brand new site

Given an initial unsorted list of N elements, we want to quickly find the k-th smallest element. For example, for the list {7, 4, 1, 5, 6}, the first smallest is 1, second smallest is 4 and so on.

An easy approach is to sort the elements in O(N logN) and select the k-th element as our solution. But we do we really need to sort the entire list?
Quickselect is an approach where we use divide and conquer and find the k th smallest element in an average O(N) time.

Quickselect algorithm

Quickselect was developed by C.A.S. Hoare in the 1960's along with the popular quicksort algorithm. Quicksort works by taking in an initial unsorted list, and divided it based on a chosen pivot. The pivot may be chosen at random.

Quickselect is similar to Quicksort, in that it uses divide and conquer strategy.
The divide part will pick a random pivot element and partition the set into three subsets: The first subset contains all elements less than pivot, the second contains all elements equal to pivot and the third contains all elements greater than the pivot.

The algorithm is summarised below

QUICKSELECT( A, k )
    pick random pivot element p
    for each element a in the set A
        if a < p: insert into L
        if a = p: insert into E
        if a > p: insert into G
        
    if k <= |L|: call quickselect(L,k)
    if |L| < k ≤ |L|+|E|: return p
    if k > |L|+|E|: call quickselect(G, k–(|L|+|E|))

Time complexity

Consider an assumption that we pick the middle element as the pivot in every iteration. Then the number of elements we are looking at are in the order
N + N/2 + N/4 + N/8 + N/16 + ... in the corresponding iterations.
This is a geometric sequence that is bounded by but never reaches 2N.

For a bad pivot, which keeps decreasing the list only by one element, the algorithm will be looking at N elements in first iteration, N-1 in the second and so on, giving us a series
N + (N-1) + (N-2) + ... which is bounded by O(N2) time.

Thus the average time complexity is O(N) and in the worst case it is O(N2).

Example C++ implementation

Below is an example implementation of the algorithm. The implementation uses a

quickselect(vector<int> v, int k){
	if(v.size() == 1) return v[0];
	
	int pivot = v[rand() % v.size()];
	vector<int> L, E, G;
	for(auto x: v){
		if (x < pivot) L.push_back(x);
		if (x == pivot) E.push_back(x);
		if( x > pivot) G.push_back(x);
	}
	if(k <= L.size()) return quickselect(L, k);
	else if (k <= (L.size() + E.size())) return pivot;
	else return quickselect(G, k-(L.size() + E.size()));
}

int main()
{
	vector<int> v({ 10, 4, 5, 8, 6, 11, 26, 30 });
	cout << "third smallest number is: " << quickselect(v, 3) << endl;
    
	return 0;
}

Example

Consider the elements {10, 4, 5, 8, 6, 11, 26, 30}. If we want the third smallest, we can see that the element 6 is the solution. In sorted order the elements are
{4, 5, 6, 8, 10, 11, 26, 30}. So the third smallest is indeed 6.

The initial call to quicksort begins with the elements {10, 4, 5, 8, 6, 11, 26, 30} and k=3.
Assume that the random pivot element chosen is 4. The partitions are
L: {}, E: {4}, G: {10, 5, 8, 6, 11, 26, 30}

Note that the partitions are ordered, i.e., elements in L are smaller than the smallest element in E, similarly, elements in E are smaller than the smallest element in G.

Since we need the third smallest element, but the L has 0 elements and E has one, so G is chosen, since it is unlikely to find it in L or in E.
The algorithm recursively calls quickselect with G={10,5,8,6,11,26,30} and k=2. We have reduced the value of k from 3 to 2, because we got rid of one smaller element, i.e., 4 in E.
so effectively we are looking for the second smaller element in G.

Assume that the random pivot element chosen is 5. The new partitions are
L: {}, E: {5}, G: {10, 8, 6, 11, 26, 30}
recall that we are now looking for k=2 smallest element. Here we see that the L has size 0 and E has size 1, thus it is unlikely to find the second smallest in L or in E.
The algorithm recursively calls quickselect with G={10,8,6,11,26,30} and k=1. Value of k is similarly reduced because we got rid of one smallest element in E

Assume that the random pivot element chosen is 26. The partitions are
L: {10, 8, 6, 11}, E: {26}, G: {30}
Here the size of L is 4 which is greater than our k, thus L is selected.
The algorithm recursively calls quickselect with L={10,8,6,11} and k=1. Here the value of k is not reduced since we aren't getting rid of any elements.

Assume that the random pivot element chosen is 10. The partitions are
L: {8, 6}, E: {10}, G: {11}
Similar as above we choose L and recursively call quickselect with L={8,6} and k=1

Assume that the random pivot element chosen is 6. The partitions are
L: {}, E: {6}, G: {8}
Here the size of L is 0 and E is 1 which means that E has the k=1 smallest, thus we return it.

Thus the k=3 smallest is 6. The value of k changes over recursive calls. but it was k=3 in the first call, which is the value returned.

Conclusion

Quickselect can find the k-th largest element in an average of O(N) time and thus is more efficient than the sorting method. Quickselect is based on divide and conquer approach and is somewhat similar to Quicksort algorithm.