Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Table of Contents
- Introduction
- Algorithm
- Implementation
- Overview
Introduction
In this article, we will be discussing how to find the geometric median, this is, the point on a line with the minimum sum distance from a set of points. We may assume that this is essentially asking us to find the geometric midpoint since the sum distances from the center point should automatically be at minima, however this is not always the case.
Take this as an example, for the set of points (0,0), (0,0), (0,12), the geometric center is at (0,4), therefore the sum of the distance to travel from the centroid point to all 3 in the set is 4 + 4 + 8 = 16, however it is clear that if we set the point as (0,0), this will give us a shorter total distance of 12. Therefore we cannot always say that the centroid point will give us the correct answer. This occurred as at the moment we are computing the arithmetic mean of said points in the set however we require the geometric median. This can be calculated as the central tendency of the set of points such that the cost to reach that central tendency from each point is minimized.
Example geometric median:
Algorithm
As of now, there isn't a defined algorithm for computing this point and therefore our approach involves approximating a solution and then determining whether the approximation is the geometric median or not.
Firstly, we should take note of two variables, "current point": which is responsible for storing the x,y coordinate of the approximated geometric point and, "minimum distance": which is the sum of all Euclidean distances from the current point to all the input points.
After initializing these variables, we can continuously update the current point based on whether it is the geometric median or not.
After each approximation, simply, if another value is found with a sum of distances that is lower, we can update the values of current point and minimum distance to the new point and lower distance.
The steps of the algorithm are then as follows:
- First, find the centroid point of the set, and assign it to current point, store the sum of the distances as minimum distance.
- Iterate over given set of points, assuming that each point is the median and calculate the distance to the other points.
- If the distance is lower than the current minimum distance, update the current point and minimum distance to the new values. Else, they remain the same.
- We then create a condition controlled loop that has an arbitrary test distance variable from the current point in all directions, giving us new points.
- Then, calculate the the distance from the new points to the set of input points, if the sum is lower than the previous minimum distance then we update the old distance and repeat the loop, if the sum is not lower then keep the values the same until a shorter distance is found.
- The halting condition of the while loop is another variable which we shall called lower limit. The lower this limit, the more accurate the approximation will be, when the lower limit is greater than the test distance, the loop terminates.
- After this occurs, we should be left with the point that gives the minimum distance, completing the algorithm.
As a step by step example, we shall use the same set of points as in the first example, (0,0), (0,0), (0,12). First. we find the centroid point (0,4). This is now our current points. The sum of the distances is 4 + 4 + 8 = 16. The 16 now becomes our minimum distance at this point in time. Then, we shall iterate over each point, taking it as the centroid. First, let's try (0, 12). The sum of the distances to all other points is 0 + 12 + 12 = 24, this is not lower than 16 therefore we move on, taking (0,0) as the centroid now, the sum to other points is 0 + 0 + 12 = 12 and 12 < 16 therefore we can now update these values as the minimum distance. The test distances and points are created as described in the algorithm, the sum of the distances from these test points to the set are then calculated and compared, in our case this will not find a smaller distance and so the algorithm terminates after our lower limit is greater than the distance, returning the final point as (0,0), and minimum distance as 12.
Implementation
The following is an implementation of our approach in C++:
#include <bits/stdc++.h>
using namespace std;
// Struct to store point (x, y)
struct Point {
double x, y;
};
// Initilizing new set of points to test
// Arranged to the right, left, up and down of
// current_point at a distance of
// test_distance from current_point
Point test_point[] = { { -1.0, 0.0 },
{ 0.0, 1.0 },
{ 1.0, 0.0 },
{ 0.0, -1.0 } };
// Lower limit to run the while loop
// Lower the limit = more accurate
double lower_limit = 0.01;
// Sum of distances
double distSum(Point p,
Point arr[], int n)
{
double sum = 0;
for (int i = 0; i < n; i++) {
double distx = abs(arr[i].x - p.x);
double disty = abs(arr[i].y - p.y);
sum += sqrt((distx * distx) + (disty * disty));
}
return sum;
}
// Function calculating our geometric median
void geometricMedian(Point arr[], int n)
{
// Current point initilization
Point current_point;
for (int i = 0; i < n; i++) {
current_point.x += arr[i].x;
current_point.y += arr[i].y;
}
current_point.x /= n;
current_point.y /= n;
// Current minimum distance
double minimum_distance =
distSum(current_point, arr, n);
int k = 0;
while (k < n) {
for (int i = 0; i < n, i != k; i++) {
Point newpoint;
newpoint.x = arr[i].x;
newpoint.y = arr[i].y;
double newd =
distSum(newpoint, arr, n);
if (newd < minimum_distance) {
minimum_distance = newd;
current_point.x = newpoint.x;
current_point.y = newpoint.y;
}
}
k++;
}
// Arbitrary test distance
double test_distance = 1000;
int flag = 0;
// Approximation begins
while (test_distance > lower_limit) {
flag = 0;
// Iterating over each of the 4 neighbours
// Right, left, up and down
for (int i = 0; i < 4; i++) {
Point newpoint;
newpoint.x = current_point.x
+ (double)test_distance * test_point[i].x;
newpoint.y = current_point.y
+ (double)test_distance * test_point[i].y;
// New sum of distances from
// neighbour to given points
double newd = distSum(newpoint, arr, n);
if (newd < minimum_distance) {
// Approximation and update
minimum_distance = newd;
current_point.x = newpoint.x;
current_point.y = newpoint.y;
flag = 1;
break;
}
}
// If none of the neighbours return minimum
// Divide test distance by 2 and repeat loop
if (flag == 0)
test_distance /= 2;
}
cout << "Geometric Median = ("
<< current_point.x << ", "
<< current_point.y << ")";
cout << " with minimum distance = "
<< minimum_distance;
}
// Driver
int main()
{
int n = 2;
Point arr[n];
arr[0].x = 3;
arr[0].y = 2;
arr[1].x = 5;
arr[1].y = 8;
geometricMedian(arr, n);
return 0;
}
Overview
In this article at OpenGenus we have covered the point on a line with minimum sum distance from set of points, or, the geometric median. We have detailed the problem and explained the process of solving it before forming an algorithm and then implementing it in code.