Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article at OpenGenus, we will solve the problem called Reconstruct Itinerary Problem. This article will guide you through the intuition of how to solve the problem using the concept of Hierholzer’s Algorithm which is frequently asked in coding interviews of various tech companies.
Table of contents:
- Problem Statement
- Hierholzer’s Algorithm
- Intuition behind using Hierholzer’s Algorithm and Implementation
- Time and Space Complexity
1. Problem Statement
The problem statement for the Reconstruct Itinerary Problem is as follows :
You are given a list of airline tickets where tickets[i] = [from_i, to_i]
represent the departure and the arrival airports of one flight. Reconstruct the itinerary in order and return it.
All of the tickets belong to a man who departs from "JFK", thus, the itinerary must begin with "JFK". If there are multiple valid itineraries, you should return the itinerary that has the smallest lexical order when read as a single string.
For example, the itinerary ["JFK", "LGA"] has a smaller lexical order than ["JFK", "LGB"].
You may assume all tickets form at least one valid itinerary. You must use all the tickets once and only once.
Example 1: The example 1 is a simpler one which doesn't contain any choice but let us learn how to find a solution.
Input: tickets
= [["MUC","LHR"],["JFK","MUC"],["SFO","SJC"],["LHR","SFO"]]
Output: ["JFK","MUC","LHR","SFO","SJC"]
Explanation : As we are given that we must begin from "JFK" , we have only one alternative as ticket[1] = ["JFK","MUC"]
, so the next station is "MUC" which makes only possible choice as ticket[0] = ["MUC","LHR"]
and so on. For above example there are no alternative choices so it's easier to solve but we got the gist of solution.
Example 2 : The example 2 is a simpler one which doesn't contain any choice but let us learn how to find a solution.
Input: tickets
= [["JFK","SFO"],["JFK","ATL"],["SFO","ATL"],["ATL","JFK"],["ATL","SFO"]]
Output: ["JFK","ATL","JFK","SFO","ATL","SFO"]
.
Explanation: Another possible reconstruction is ["JFK","SFO","ATL","JFK","ATL","SFO"] but it is larger in lexical order.
From "JFK" we have two choices which are "SFO" and "ATL". From "ATL" we also have 2 choices which are "JFK" and "SFO" and similarly from "SFO" we have choice to move on to "ATL".
To visualize the possible solutions we have the following solutions possible :
The solution 1 , 2 and 4 are complete solutions but since we only have to return the solution with the smallest lexical order when read as a single string. The solution 4 is the solution with smallest lexical order.
2. Hierholzer’s Algorithm
Hierholzer’s Algorithm is a technique used to discover an Euler path in a graph. An Euler path is a path that traverses every edge of a graph exactly once and ends up on the same node that it started.
The algorithm assumes that the given graph has an Eulerian path. The conditions for a directed graph to have an Eulerian path are:
- All vertices with nonzero degrees belong to a single strongly connected component.
- In-degree and out-degree of every vertex is the same except at most one vertex may have
outdegree - indegree = 1
and at most one vertex haveindegree - outdegree = 1
.
The idea behind the algorithm is to force the Depth first search algorithm to visit each and every edge possible in the graph. But why force doesn't the depth first search visit all the nodes ? Let's see it through an example:
The below is the pseudo implementation of typical DFS algorithm on a graph :
procedure DFS(node):
if node is not visited:
mark node as visited
process(node)
for each neighbor in neighbors of node:
DFS(neighbor)
If we take the graph given below :
and run the DFS in it we get the following only possible outcome is :
Outcome : 1 ->2 -> 3 with using E1 and E3 skipped edge E2.
To visit the edge E2 we need to force the DFS such that the algorithm don't stop after visiting the nodes but edges. So we use the fact that if we have the above 2 conditions satisfied.
Let check if the eulerian path exists or not :
Node 1 : indegree = 0 , outdegree = 1
Node 2 : indegree = 2 , outdegree = 2
Node 3 : indegree = 1 , outdegree = 3
Only Node 1 and Node 3 have the imbalance in indegree and outdegree, and Node 2 have equal indegree and outdegree which means the above graph satisfies the two conditions and eulerian path exists.
Now let's see how we can modify the DFS to force it to visit every edge. We will use the information regrading the outgoing edges count or outdegree. We will modify the DFS such that it will backtrack if it can't move forward and check if all the edges of previous nodes are visited or not. To understand this let's use use out previous example and see how to use the outdegree :
In figure we will move step by step:
- Step 1: We move forward using E1 and reduce the outdegree of node 1 to zero.
- Step 2 : Now from node 2 we move on to the node 3 using edge E3 reducing outdegree of node 2 from 2 to 1.
- Step 3 : From node 3 we have no more nodes to visit and outdegree of the node 3 is also 0 which shows the same. It will backtrack to node 2 from node 3.
- Step 4 : At node 2 we have outdegree = 1 which means we still have a path to go which is in this case E2 edge. So we take the edge E2 and reduce the outdegree of node 2 to zero and come back to node 2 again.
- Step 5 : Since now outdegree of node 2 is 0 we have no more edge so we backtrack two times and reach node 1.
- Step 6 : The node 1 also have outdegree of 0 so DFS terminate here.
The pseudo code for the algorithm is as follows :
procedure DFS(node):
if outdegree[node] is not 0:
index_of_next_neighbour = edgecount[node] - outdegree[node]
reduce outdegree[node] by 1
DFS(node.neighbour[index_of_next_neighbour])
process(node)
The algorithm is efficient, finding the path in linear time, O(E), where E is the number of edges. The space complexity depends on the representation used to store the graph.
3. Intuition behind using Hierholzer’s Algorithm and Implementation
Now the question arises that how we will use the Hierholzer’s Algorithm in out problem ? The answer is every clear we have to reconstruct the itinerary such that all the tickets will get used. So we can formulate the problem as a graph and find the eulerian path in the graph which will the solution to our problem. But how to solve get the minimum lexical order ?
Let's first start dwelling into the solution , the first step is to make a graph out of given tickets using example tickets
= [["JFK","SFO"],["JFK","ATL"],["SFO","ATL"],["ATL","JFK"],["ATL","SFO"]]
:
We will use the unordered_maps for the implementation to be simpler and efficiency proposes , the adjacency list looks like :
graphs_map :
- Node "JFK" : [ "SFO" , "ATL" ]
- Node "SFO" : [ "ATL" ]
- Node "ATL" : [ "JFK" , "SFO" ]
out_degrees_map:
- Node "JFK" : 2
- Node "SFO" : 1
- Node "ATL" : 2
Note : We are not checking the two conditions since it's given that solution always exists.
The method will look like the following :
void make_graph(vector<vector<string>> &tickets){
for(int i = 0 ; i < tickets.size() ; i++)
{
graph[tickets[i][0]].push_back(tickets[i][1]);
out_count[tickets[i][0]]++;
}
}
To address the problem of getting the lowest lexical order, we will sort the adjacency list of each entry of our graph which will modify the graph as follow :
graphs_map :
- Node "JFK" : [ "ATL" , "SFO" ]
- Node "SFO" : [ "ATL" ]
- Node "ATL" : [ "JFK" , "SFO" ]
The code to do so will look like :
for(auto &list : graph){
sort(list.second.begin() , list.second.end());
}
Now we have to start the DFS using the "JFK" as the start point , the DFS will work as follows :
The answer is as follows : [ "SFO" , "ATL" , "SFO" , "JFK" , "ATL" , "JFK" ]
but the answer is in reverse order so we reverse the order which results in ["JFK","ATL","JFK","SFO","ATL","SFO"]
as our final answer. The code will be as follows :
void dfs(string s){
while(counts[s] != 0){
int index = graph[s].size()-counts[s];
counts[s]--;
dfs(graph[s][index]);
}
ans.push_back(s);
}
dfs("JFK");
// reverse
reverse(ans.begin() , ans.end());
The Complete implementation is as follows :
class Solution {
public:
// store answer
vector<string> ans;
// store in counts
unordered_map<string,int> counts;
// maintain graph
unordered_map<string, vector<string>> graph;
// make graph from tickets
void make_graph(vector<vector<string>>& tickets){
for(auto i : tickets){
graph[i[0]].push_back(i[1]);
counts[i[0]]++;
}
}
// do DFS to visit all edges
void dfs(string s){
while(counts[s] != 0){
int index = graph[s].size()-counts[s];
counts[s]--;
dfs(graph[s][index]);
}
ans.push_back(s);
}
// Reconstruct Itinerary
vector<string> findItinerary(vector<vector<string>>& tickets) {
// lets make a graph first out of it
make_graph(tickets);
// now sort them in lexo order
for(auto &list : graph){
sort(list.second.begin(), list.second.end());
}
// now dfs
dfs("JFK");
// reverse
reverse(ans.begin() , ans.end());
// return ans
return ans;
}
};
int main(){
Solution S;
auto ans = S.findItinerary(tickets);
for(auto i : ans){
std::cout<<i<<" ";
}
std::cout<<endl;
return 0;
}
Input
tickets = [["JFK","SFO"],["JFK","ATL"],["SFO","ATL"],["ATL","JFK"],["ATL","SFO"]]
Output
["JFK","ATL","JFK","SFO","ATL","SFO"]
4. Time and Space Complexity
The time and space complexity analysis is as follows :
Time Complexity:
-
Building the graph: The
make_graph
function iterates through each ticket once and adds the corresponding edges to the graph. In the worst case, it has a time complexity of O(N), where N is the number of tickets. -
Sorting the adjacency lists: After creating the graph, the code iterates through each list of destinations in the graph and sorts them. The sorting step takes O(E * log(E)) time, where E is the total number of edges in the graph. Since each ticket contributes one edge, the worst-case time complexity for this step is O(N * log(N)), where N is the number of tickets.
-
Depth-First Search (DFS): The
dfs
function performs a DFS traversal starting from the "JFK" airport. In the worst case, it visits each edge once. The time complexity of DFS is O(V + E), where V is the number of vertices (airports) and E is the number of edges (tickets). In this case, it's O(N), as each ticket contributes one edge.
Therefore, the overall time complexity of the findItinerary
function is O(E * log(E) + E ) = O ( E * log(E) ).
Space Complexity:
-
Storing the answer: The
ans
vector stores the final itinerary, and its space complexity is O(N), where N is the number of airports. -
Storing in counts: The
counts
unordered_map stores the count of remaining edges for each airport. In the worst case, there can be N airports, so the space complexity is O(N). -
Maintaining the graph: The
graph
unordered_map stores the adjacency list for each airport. In the worst case, there can be N airports, and each airport can have N-1 destinations. Therefore, the space complexity is O(N + N * (N-1)) = O(N^2). -
Recursive call stack: The depth of the recursion during the DFS traversal is at most N, where N is the number of airports. Therefore, the space complexity due to the call stack is O(N).
Overall, the space complexity is O(N^2) for the findItinerary
function.
Key Takeaways (Hierholzer’s Algorithm and Solving Reconstruct Itinerary Problem)
- Hierholzer’s Algorithm is used to find an Euler path in a graph.Use DFS with a modified version is Hierholzer’s Algorithm to ensure all edges are visited . Backtrack when there are remaining unvisited edges to explore all possibilities.
- Reconstruct Itinerary Problem can be solved using the Hierholzer’s Algorithm and reversing the resultant order.
- Hierholzer’s Algorithm is efficient, finding the path in linear time, O(E), where E is the number of edges. The space complexity depends on the representation used to store the graph.
- Using the algorithm the problem to reconstruct the Itinerary time complexity achieved is O(E * log(E)), where E is the number of edges (tickets), considering the graph construction and sorting steps. Space Complexity is O(N^2), where N is the number of airports, accounting for the graph representation and recursive call stack.