Tarjan's Algorithm to find Strongly Connected Components


Reading time: 30 minutes | Coding time: 12 minutes

Tarjan's Algorithm is an efficient graph algorithm to find the strongly connected components in a directed graph in linear time by utilizing Depth First Search traversal of a graph. The key idea used is that nodes of strongly connected component form a subtree in the DFS spanning tree of the graph.

The task of partitioning a directed graph into its Strongly Connected Components is of high importance and is used extensively in various problems. Two algorithms used for this purpose are:

  • Tarjan's algorithms
  • Kosaraju's algorithm

We will explore Tarjan's algorithm in this article.

Before proceeding further, we will revisit the two basic concepts/ terminologies in graph:

  • strongly connected graph
  • strongly connected component

A directed graph is called strongly connected if each vertex of the graph is reachable from every other vertex in the graph. This means that a path would exist between each pair of nodes.

For a directed graph, a strongly connected component is a parition or a subgraph, which is not a subgraph of another strongly connected component. This means that a strongly connected component has to be the maximal subgraph satisfying the condition. Henceforth, we will refer to strongly connected components using the abbreviated term SCC.

Algorithm

It utilizes the property that nodes of strongly connected component form a subtree in the DFS spanning tree of the graph.

The steps involved are:

  • A dfs is run over the nodes and the subtrees of SCCs are removed and recorded as they are encounered.
  • Two values dfs_num(u) and dfs_low(u) are maintained for each of the users. dfs_num(u) is the value of the counter when the node u is explored for the first time. dfs_low(u) stores the lowest dfs_num reachable from u which is not the part of another SCC.
  • As the nodes are explored, they are pushed onto a stack.
  • The unexplored children of a node are explored and dfs_low(u) is accordingly updated.
  • A node is encountered with dfs_low(u) == dfs_num(u) is the first explored node in its strongly connected component and all the nodes above it in the stack are popped out and assigned the appropriate SCC number.

Complexity

  • Time complexity: The algorithm is built upon DFS and therefore, each node is visited once and only once. For each node, we perform some constant amount of work and iterate over its adjanceny list. Thus, the complexity is O(|V|+ |E|)
  • At maximum, the depth of recursion and the size of stack can be n nodes. Thus the complexity is O(|V|)

Example

The following figures show the flow of the algorithm on a directed graph with 3 SCCs. Each SCC has been enclosed in a box and the pair (dfs_low(), dfs_num(u)) is mentioned with each node.

topic of image Let us call the tarjan function on node 1. topic of image dfs_low[1] and dfs_num[1] are set to 1 & node 1 is pushed to the stack. topic of image dfs_low[2] and dfs_num[2] are set to 2 and node 2 is pushed to the stack. Let's say that node 4 lies before node 2 in the adjacency list of node 2. So, the function is first called for node 4. topic of image dfs_low[4] and dfs_num[4] are set to 3 & node 4 is pushed to the stack. topic of image dfs_low[5] and dfs_num[5] are set to 4 & node 5 is pushed to the stack. topic of image dfs_low[6] and dfs_num[6] are set to 5 & node 6 is pushed to the stack. topic of image dfs_low[7] and dfs_num[7] are set to 6 & node 7 is pushed to the stack. topic of image dfs_low[8] and dfs_num[8] are set to 7 & node 8 is pushed to the stack. Owing to node 5, dfs_low[8] is updated to be 4. dfs_low[7] is updated to 4. dfs_low[6] is updated to 4. For node 5, dfs_low[5] and dfs_num[5] remain the same. So, nodes are popped from the stack until node 5 resulting in- SCC: 5 8, 7, 6

For node 4, dfs_low[4] and dfs_num[4] remain the same. So, nodes are popped from the stack until node 4 resulting in-
SCC: 4

topic of image

dfs_low[3] and dfs_num[3] are set to 8 & node 3 is pushed to the stack.
Owing to node 1, dfs_low[3] is updated to be 1.
dfs_low[2] is updated to 1.

For node 1, dfs_low[1] and dfs_num[1] remain the same. So, nodes are popped from the stack until node 1 resulting in-
SCC: 1 2, 3

This figure shows the DFS spanning tree of the graph. It can be easily observed that the SCCs form subtrees in the spanning tree.

topic of image

The output for code for this graph is,


SCC-1-> 5 8, 7, 6, 
SCC-2-> 4 
SCC-3-> 1 2, 3, 

Implementation

  • adj_list[NMAX]: Adjacency list for each node
  • dfs_num[NMAX]: dfs_num value for each node
  • dfs_low[NMAX]: dfs_low value for each node
  • SCC[NMAX]: SCC number for each node
  • C++

C++


#include<iostream>
#include<stack>
#define NODE 8
using namespace std;
int graph[NODE][NODE] = {
   {0, 0, 1, 0, 0, 0, 0, 0},
   {1, 0, 0, 0, 0, 0, 0, 0},
   {0, 1, 0, 1, 0, 0, 0, 0},
   {0, 0, 0, 0, 1, 0, 0, 0},
   {0, 0, 0, 0, 0, 1, 0, 0},
   {0, 0, 0, 0, 0, 0, 1, 0},
   {0, 0, 0, 0, 0, 0, 0, 1},
   {0, 0, 0, 0, 1, 0, 0, 0}
};
int min(int a, int b) {
   return (a<b)?a:b;
}
void findComponent(int u, int disc[], int low[], stack<int>&stk, bool stkItem[]) {
   static int time = 0;
   disc[u] = low[u] = ++time;    //inilially discovery time and low value is 1
   stk.push(u);
   stkItem[u] = true;    //flag as u in the stack
   for(int v = 0; v<NODE; v++) {
      if(graph[u][v]) {
         if(disc[v] == -1) {   //when v is not visited
            findComponent(v, disc, low, stk, stkItem);
            low[u] = min(low[u], low[v]);
         } else if(stkItem[v])    //when v is in the stack, update low for u
            low[u] = min(low[u], disc[v]);
      }
   }
   int poppedItem = 0;
   if(low[u] == disc[u]) {
      while(stk.top() != u) {
         poppedItem = stk.top();
         cout << poppedItem << " ";
         stkItem[poppedItem] = false;    //mark as item is popped
         stk.pop();
      }
      poppedItem = stk.top();
      cout << poppedItem <<endl;
      stkItem[poppedItem] = false;
      stk.pop();
   }
}
void strongConComponent() {
   int disc[NODE], low[NODE];
   bool stkItem[NODE];
   stack<int> stk;
   for(int i = 0; i<NODE; i++) {    //initialize all elements
      disc[i] = low[i] = -1;
      stkItem[i] = false;
   }
   for(int i = 0; i<NODE; i++)    //initialize all elements
      if(disc[i] == -1)
         findComponent(i, disc, low, stk, stkItem);
}
int main() {
   strongConComponent();
}

Applications

The applications of Tarjan's algorithm to find strongly connected components are:

  • To converted the graph into a Directed Acyclic Graph of strongly connected components
  • For solving 2 SAT problems, where the problem is unsatisfiable if both a variable and its complement lie in the same SCC.

Questions

Question 1: It is possible for two nodes belonging to different SCCs to be reachable from one another.
True
False
A SCC has to be the maximal partition satisfying the condition.
Question 2: It can be seen in the code that we iterate over the all the nodes and if the corresponding SCC has not been determined yet, we call the 'tarjan' function. During this iteration, is it possible here for a node to not have an assigned SCC no. and another node from its SCC to have it assigned ?
Yes
No
The very meaning of an SCC is that nodes belonging to it are reachable from each other. So whenever the tarjan function is called on a node, it is guaranteed for each node belonging to its SCC to be visited.

Tarjan's vs Kosaraju's algorithm


While both the methods have a linear time complexity, the techniques for SCC computation are fairly different. Tarjan's method relies on the timestamps of nodes in a DFS to parition the graph whereas Kosaraju's method performs two DFSs on the graph and is somewhat similar to the method for finding the topological sorting.