Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article, we have solved the problem of Water distribution using Minimum Spanning Tree, Union Find and Sorting Algorithms.

Pre-requisites:

### Problem Statement

You are given `n`

houses in a village. Here you have to supply water to all the houses. Either you can build a well at a house, or else construct a pipeline from another well to it. To build a well directly in a house, it will cost wells[i] and cost to lay pipes between houses are given by the array pipes, where each `pipes[i]=[House1, House2, Cost]`

represents the cost to connect two houses together using a pipe. All pipe connections made are bidirectional.

Print the minimum total cost to supply water to all houses.

```
Input: n = 3, wells = [1,2,2], pipes = [[1,2,1],[2,3,1]]
Output: 3
Explanation: Here the best way to build a well in the first house with cost 1 and connectthe other houses to it with cost 2, and hence the total is 3.
```

### Observation

- Here the cost of building a well in a house is a fixed cost, ie, there are no way to reduce or influence it. Where as, the cost to lay pipes differ, since it depends in what path we consider.
- The graph formed by the houses and pipes is undirected, which means that the cost of laying a pipe between two houses is the same in both directions. Therefore, we can represent the graph using an adjacency list or an adjacency matrix.
- To find the minimum total cost, of supplying water to all the houses, we need to find a minimum spanning tree of the graph, which is a subset of the edges that connects all the vertices with the minimum possible total edge weight.
- Kruskal's algorithm is an efficient algorithm for finding a minimum spanning tree, which works by adding the edges in non-decreasing order of their weights and checking if they form a cycle in the graph. If they don't form a cycle, the edge is added to the minimum spanning tree.
- To implement Kruskal's algorithm, we need a data structure to maintain a set of disjoint subsets of the vertices. The Union-Find data structure is a commonly used data structure for this purpose.
- Once we have found the minimum spanning tree, we can calculate the total cost of supplying water to all the houses by adding up the weights of the edges in the minimum spanning tree.

### Approach

**Intuitive steps**

- Create vector
`costs`

of`EdgeCost`

object to represent connections . - Push data in wells vector and pipes vector into costs.
- Sort the costs vector in increasing order of costs.
- Create
`UnionFind`

object with the`number of nodes (wells and pipes) + 1`

. - Check if each nodes are connected or not using
`find()`

. If not, connect using`unionSets()`

. - While connecting add the cost to
`minCost`

and reduce number of nodes by`1`

. - Return the
`minCost`

when`n == 0`

.

**Explained**

- First we will create a vector of
`EdgeCost`

objects called costs inside`minCostToSupplyWater()`

to store the edges in the graph. We push it in the form`0, i, wells[i-1]`

. When adding the cost of building wells, the costs list is initialized with edges from node`0`

to each of the houses, with the cost being the corresponding well cost from the input wells array. This is why`0`

is used as the first node. The houses are numbered from`1`

to`n`

but the wells vector is`0`

indexed, thats why we push`wells[i-1]`

. - After that we will add the costs of connecting pipes between houses into the
`costs`

list. The`pipes`

vector is a`2D`

vector where`p[x][0]`

is the index of house one,`p[x][1]`

is the index of house two and`p[x][2]`

is the cost to connect these houses. We will add the contents in the`pipes`

vector into the`cost`

vector. Now we will sort the`cost`

vector. The purpose of sorting is to ensure that we always consider the edges with lowest possible. Here we will sort the objects based on the`cost`

member variable of each object. - Now we will call the
`UnionFind uf(n + 1)`

to initialize an instances of the`UnionFind`

class with`n + 1`

elements. By initializing`uf`

with`n+1`

, we are creating`n`

elements numbered from`1`

to`n`

and an additional element numbered`0`

, which represents the index corresponding to the well at house`1`

. - The
`UnionFind`

class has two operations:`find(x)`

: The find operation returns the root element of the set to which a given element belongs. In this implementation, the find method takes an element`x`

as input and returns its root element by recursively following the parent pointers of`x`

until it reaches an element whose parent is itself.`unionSet(x,y)`

: The`unionSets`

operation is used to merge two disjoint sets into a single set. Here the`unionSets`

method takes two elements`x`

and`y`

as input and merges the sets to which they belong by setting the parent of the root element of one set to the root element of the other set, and updating the rank of the root element accordingly to maintain a balanced tree structure.

- Now we will create & initialize the
`minCosts = 0`

. The next step is to iterate through the sorted vector`costs`

. Each of the edges present inside the vector represents a pipe or a wall. For each edge, we will find the roots of the tree to which the two nodes, connected by the edge, belong to using the`find()`

method present in`UnionFind`

data structure. If the roots are same it means that both the nodes are already connected and we will continue to the next edge. - If the roots are different, it means that they are not connected and we need to connect them. To connect them we will use
`unionSets()`

method of the`UnionFind`

data structure, which merges the trees to which the two nodes belong to. At this time we will add the cost of the edge to`minCost`

variable. After each union of two sets, we connect one node, so we will decrease the node count`n`

by one.We will return the`minCost`

once we have connected all the nodes`(n = 0)`

. In this way we make sure that we have the minimum cost.

```
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
class OptimizeWaterDistribution {
public:
int minCostToSupplyWater(int n, vector<int>& wells, vector<vector<int>>& pipes) {
// Construct a list of all edges with their costs
vector<EdgeCost> costs;
for (int i = 1; i <= n; i++) {
costs.push_back(EdgeCost(0, i, wells[i - 1]));
}
for (auto p : pipes) {
costs.push_back(EdgeCost(p[0], p[1], p[2]));
}
sort(costs.begin(), costs.end());
// Initialize Union-Find data structure
UnionFind uf(n + 1);
// Iterate over edges and connect nodes using Union-Find
int minCosts = 0;
for (auto edge : costs) {
int rootX = uf.find(edge.node1);
int rootY = uf.find(edge.node2);
if (rootX == rootY) continue;
minCosts += edge.cost;
uf.unionSets(rootX, rootY);
// For each union, we connect one node
n--;
// If all nodes are already connected, terminate early
if (n == 0) {
return minCosts;
}
}
return minCosts;
}
private:
struct EdgeCost {
int node1;
int node2;
int cost;
EdgeCost(int n1, int n2, int c) : node1(n1), node2(n2), cost(c) {}
bool operator<(const EdgeCost& other) const {
return cost < other.cost;
}
};
class UnionFind {
public:
UnionFind(int size) {
parent.resize(size);
rank.resize(size);
for (int i = 0; i < size; i++) {
parent[i] = i;
rank[i] = 0;
}
}
int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]);
}
return parent[x];
}
void unionSets(int x, int y) {
int rootX = find(x);
int rootY = find(y);
if (rootX == rootY) return;
if (rank[rootX] > rank[rootY]) {
parent[rootY] = rootX;
} else {
parent[rootX] = rootY;
if (rank[rootX] == rank[rootY]) {
rank[rootY]++;
}
}
}
private:
vector<int> parent;
vector<int> rank;
};
};
int main() {
OptimizeWaterDistribution solution;
int n = 3;
vector<int> wells = {1, 2, 2};
vector<vector<int>> pipes = {{1, 2, 1}, {2, 3, 1}};
int minCost = solution.minCostToSupplyWater(n, wells, pipes);
cout << "Minimum cost to supply water to all houses: " << minCost << endl;
return 0;
}
```

**Output**

```
Minimum cost to supply water to all houses: 3
```

### Complexities

It takes `O(n)`

time when we iterate through the loop first time where `n`

is the total wells possible. The second loop takes `O(m)`

time, where `m`

is the number of pipes.

So the sorting will take around `O((n+m)log(n+m))`

time since we will have `n + m`

edges. In the end we will perform at most `n-1`

unions, each of which takes `O(log n)`

time in the worst case, so it would take `O((n+m)log(n))`

time. Hence we can say that the overall time complexity is `O((n+m)log(n+m))`

.

Here we are using vector of size n+1 for the parent and rank arrays, which will take `O(n)`

space. For the costs vector we have `m`

elements then it will take `O(m)`

space. Hence the overall space complexity is `O(max(n,m))`

.

With this article at OpenGenus, you must have the complete idea of solving this problem of Minimum Spanning Tree.