Tarjan's Algorithm to find Strongly Connected Components
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
Reading time: 30 minutes | Coding time: 12 minutes
Tarjan's Algorithm is an efficient graph algorithm to find the strongly connected components in a directed graph in linear time by utilizing Depth First Search traversal of a graph. The key idea used is that nodes of strongly connected component form a subtree in the DFS spanning tree of the graph.
The task of partitioning a directed graph into its Strongly Connected Components is of high importance and is used extensively in various problems. Two algorithms used for this purpose are:
- Tarjan's algorithms
- Kosaraju's algorithm
We will explore Tarjan's algorithm in this article.
Before proceeding further, we will revisit the two basic concepts/ terminologies in graph:
- strongly connected graph
- strongly connected component
A directed graph is called strongly connected if each vertex of the graph is reachable from every other vertex in the graph. This means that a path would exist between each pair of nodes.
For a directed graph, a strongly connected component is a parition or a subgraph, which is not a subgraph of another strongly connected component. This means that a strongly connected component has to be the maximal subgraph satisfying the condition. Henceforth, we will refer to strongly connected components using the abbreviated term SCC.
Algorithm
It utilizes the property that nodes of strongly connected component form a subtree in the DFS spanning tree of the graph.
The steps involved are:
- A dfs is run over the nodes and the subtrees of SCCs are removed and recorded as they are encounered.
- Two values dfs_num(u) and dfs_low(u) are maintained for each of the users. dfs_num(u) is the value of the counter when the node u is explored for the first time. dfs_low(u) stores the lowest dfs_num reachable from u which is not the part of another SCC.
- As the nodes are explored, they are pushed onto a stack.
- The unexplored children of a node are explored and dfs_low(u) is accordingly updated.
- A node is encountered with dfs_low(u) == dfs_num(u) is the first explored node in its strongly connected component and all the nodes above it in the stack are popped out and assigned the appropriate SCC number.
Complexity
- Time complexity: The algorithm is built upon DFS and therefore, each node is visited once and only once. For each node, we perform some constant amount of work and iterate over its adjanceny list. Thus, the complexity is
O(|V|+ |E|)
- At maximum, the depth of recursion and the size of stack can be n nodes. Thus the complexity is
O(|V|)
Example
The following figures show the flow of the algorithm on a directed graph with 3 SCCs. Each SCC has been enclosed in a box and the pair (dfs_low(), dfs_num(u)) is mentioned with each node.
For node 4, dfs_low[4] and dfs_num[4] remain the same. So, nodes are popped from the stack until node 4 resulting in-
SCC: 4
dfs_low[3] and dfs_num[3] are set to 8 & node 3 is pushed to the stack.
Owing to node 1, dfs_low[3] is updated to be 1.
dfs_low[2] is updated to 1.
For node 1, dfs_low[1] and dfs_num[1] remain the same. So, nodes are popped from the stack until node 1 resulting in-
SCC: 1 2, 3
This figure shows the DFS spanning tree of the graph. It can be easily observed that the SCCs form subtrees in the spanning tree.
The output for code for this graph is,
SCC-1-> 5 8, 7, 6,
SCC-2-> 4
SCC-3-> 1 2, 3,
Implementation
- adj_list[NMAX]: Adjacency list for each node
- dfs_num[NMAX]: dfs_num value for each node
- dfs_low[NMAX]: dfs_low value for each node
- SCC[NMAX]: SCC number for each node
- C++
C++
#include<iostream>
#include<stack>
#define NODE 8
using namespace std;
int graph[NODE][NODE] = {
{0, 0, 1, 0, 0, 0, 0, 0},
{1, 0, 0, 0, 0, 0, 0, 0},
{0, 1, 0, 1, 0, 0, 0, 0},
{0, 0, 0, 0, 1, 0, 0, 0},
{0, 0, 0, 0, 0, 1, 0, 0},
{0, 0, 0, 0, 0, 0, 1, 0},
{0, 0, 0, 0, 0, 0, 0, 1},
{0, 0, 0, 0, 1, 0, 0, 0}
};
int min(int a, int b) {
return (a<b)?a:b;
}
void findComponent(int u, int disc[], int low[], stack<int>&stk, bool stkItem[]) {
static int time = 0;
disc[u] = low[u] = ++time; //inilially discovery time and low value is 1
stk.push(u);
stkItem[u] = true; //flag as u in the stack
for(int v = 0; v<NODE; v++) {
if(graph[u][v]) {
if(disc[v] == -1) { //when v is not visited
findComponent(v, disc, low, stk, stkItem);
low[u] = min(low[u], low[v]);
} else if(stkItem[v]) //when v is in the stack, update low for u
low[u] = min(low[u], disc[v]);
}
}
int poppedItem = 0;
if(low[u] == disc[u]) {
while(stk.top() != u) {
poppedItem = stk.top();
cout << poppedItem << " ";
stkItem[poppedItem] = false; //mark as item is popped
stk.pop();
}
poppedItem = stk.top();
cout << poppedItem <<endl;
stkItem[poppedItem] = false;
stk.pop();
}
}
void strongConComponent() {
int disc[NODE], low[NODE];
bool stkItem[NODE];
stack<int> stk;
for(int i = 0; i<NODE; i++) { //initialize all elements
disc[i] = low[i] = -1;
stkItem[i] = false;
}
for(int i = 0; i<NODE; i++) //initialize all elements
if(disc[i] == -1)
findComponent(i, disc, low, stk, stkItem);
}
int main() {
strongConComponent();
}
Applications
The applications of Tarjan's algorithm to find strongly connected components are:
- To converted the graph into a Directed Acyclic Graph of strongly connected components
- For solving 2 SAT problems, where the problem is unsatisfiable if both a variable and its complement lie in the same SCC.
Questions
Question 1: It is possible for two nodes belonging to different SCCs to be reachable from one another.
Question 2: It can be seen in the code that we iterate over the all the nodes and if the corresponding SCC has not been determined yet, we call the 'tarjan' function. During this iteration, is it possible here for a node to not have an assigned SCC no. and another node from its SCC to have it assigned ?
Tarjan's vs Kosaraju's algorithm
While both the methods have a linear time complexity, the techniques for SCC computation are fairly different. Tarjan's method relies on the timestamps of nodes in a DFS to parition the graph whereas Kosaraju's method performs two DFSs on the graph and is somewhat similar to the method for finding the topological sorting.
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.