Pollard’s rho algorithm for factorization

Free Linux Book

Get FREE domain for 1st year and build your brand new site

Pollard's Rho Algorithm is a very interesting and quite accessible algorithm for factoring numbers. It was invented by John Pollard in 1975. It is not the fastest algorithm by far but in practice it outperforms trial division by many orders of magnitude. It is based on very simple ideas that can be used in other contexts as well. The amount of space required by this algorithm is very small or less and the running time taken by this particular algorithm is proportional to the square root of the size of the least or smallest prime factor of the composite number which is being factorized. Now first we will understand what is prime factor and composite number.

Prime numbers are those numbers which are divisible by 1 and the number itself. Some examples of prime numbers are 2,3,5,7,... and so on. As we can see that 2,3,4,.. these all are divisible by either 1 or the number itself. So these prime numbers becomes prime factor for other numbers. Prime factors are those numbers which are prime and they divide the given number successfully. Some examples of prime factors are: Suppose we have a number say 60 so let's start writing the factors of 60. Factors of 60 are 1,2,3,4,5,6,10,12,15,20,30,60. Among all these factors of 60 we can see that there are some prime numbers(2,3,5) so these particular numbers are known as prime factors of 60. There is a lot of confusion arises in each and everyone mind that whether 1 is a prime number or not. So i want to tell you that 1 is not considered as a prime number.

Now after understanding what are prime numbers and prime factors let's move to composite number. We can simply say that a composite number is any number which is not a prime number. In other words we can say that it is a number that has atleast one divisor or factor other than 1 and itself.

Note: 1 is neither a prime number nor a composite number because it does not satisfy the conditions of prime and composite number. As it's sole divisor is 1 and it does not have atleast one divisor other than 1 and itself so it is not treated as a composite number. In modern mathematics it has been said that a prime number must have 2 divisors ( 1 and the number itself ) and we can see that 1 has only one divisor so that's why it is not considered as a prime number as well.

Now coming back to our problem we have to find the factor of a composite number which is also a positive integer. So Pollard's rho method helps us in finding that. In this method we will be given a positive integer say n which can be represented as a product of two numbers(say a and b) where one is a non trivial factor of n and we are supposed to find the factor of that number. What is a non trivial factor? Any factor other than 1 and the number itself is treated as a non trivial factor of a number.

There are two methods of solving this problem.

  1. Brute force method
  2. Optimization of brute force method applying some concepts like birthday paradox,Floyd's cycle finding algorithm

First let's discuss about the brute force method. In this we will test the integers upto n(given number) until a divisor is found.
Approach: We simply take a number(say n) as input and start iterating a variable i from 1 to n(number). If a given number is even we can say that 2 will be its factor but if the number is odd we have to start iterating and for each iteration check whether it is a factor of n or not. Let us see its implementation.

C++ implementation:

#include <bits/stdc++.h>
using namespace std;

int main (void) {
    int n,i;
    // Read in the number N
    cin>>n;
    if (n%2 == 0) {
       cout<<"2 is factor"<<endl;
    }
    else{
        for (i = 3; i < n; i+= 2){
            if (n % i == 0) {
                cout<<i<<" "<<"is a factor of number"<<" "<<n<<endl;
                break;
            }
        }
    }
    return 0;
}

Output:

1235
5 is a factor of number 1235

Now the above mentioned code is not efficient for large value of n as it is iterating upto n and it's time complexity is O(n) where n is the inputted number.
That's why we have to optimize this approach to obtain the factor in less time.
To achieve this we will be using concepts of birthday paradox and Floyd's cycle finding algorithm to improve this method.

Before moving to second method first we will see some concepts that should be used in this algorithm. They include:

  1. Greatest common divisor(GCD): It is the greatest number which divides two given numbers successfully.
  2. Birthday Paradox: The chance of two persons having birthday on a same date is high even when there are small number of persons.
  3. Two numbers are said to be congruent modulo n (a=b modulo n) if they produce same remainder when divided by given number or when their absolute difference is an integer multiple of given number.
  4. Floyd's cycle-finding algorithm: If two persons start at same place and move in a cycle in such a way that speed of first person is twice the speed of second person then they must meet at any place in the cycle.
    Let us see how to approach this problem, what steps should be needed , what are the problems in doing certain things and all related stuff in the below mentioned paragraph.

We are given n = a * b and we randomly picked a number to check for the factor between 1 and n. The chances of getting either a or b are quite small. So we have to repeat the algorithm so many times to get us to even odds. We will use some another thinking. Instead of picking just one number, we can pick k numbers for some k. Let these numbers be x1,x2,x3..xk We can now ask if |xi-xj| where i is not equal to j divides n. The difference between the selecting one number in a range while selecting some numbers from that range is same as the difference between picking one person and asking if their birthday falls on any specific date (say 10th January) or picking k people and asking if any two among k share a birthday. We can already see that for k roughly around sqrt{n}, we get even odds. So, we can improve the chances. Therefore, for a 10 digit number, instead of selecting 10^10 numbers, we can pick k = 10^5 random numbers. This concept is based on "Birthday Paradox".
But unfortunately, this does not help us because with k = 10^5 people, we still do k^2 = 10^10 pairwise comparisons and divisions. So now picking numbers and finding whether xi-xj divides n or not we can check whether gcd(|xi-xj|,n)>1. Here gcd is "Greatest common divisor" and |xi-xj| means the absolute difference between two numbers. In other words, we can check if |xi-xj| and N have a non-trivial factor in common. This at once increases the number of chances for successes. If we check how many numbers divide n, we have just 2 (a and b). Now if we check how many numbers have GCD(x,n) > 1, we have quite a few numbers. So the approach is simple we just have to do these operations:

  1. Pick k numbers x1,x2,...,xk uniformly at random between 2 and n-1.
  2. Check if GCD(|xi-xj|,n) > 1. If yes, then GCD(|xi-xj|,n) is a factor of n (either a or b).

But there is again a problem arises, we need to pick a number k that is in the order of N^(1/4) and do pairwise comparisons. This is huge to store in the memory. If n=10^32, we are storing 10^8 numbers in memory, which is very large.

Now for implementing Pollard's rho method we have to optimize it in such a way that we have only two numbers in memory. Therefore, Pollard's rho algorithm works like this. We would like to generate k numbers x1,x2,...xk and check pairwise but we are unable to fulfill that. The next important thing is to generate random numbers one by one and check two consecutive numbers. We use a function f that will generate pseudo random numbers(a sequence of random numbers). There will be one function with pseudo random property is f(x)= (x^2 + a) mod n, where we form 'a' using random number generator. Start with random x and a. Take y equal to x and f(x) = x^2 + a. We will update x to f(x) (modulo n) and update y to f(f(y)) (modulo n) as these represent first person and second person speed as this represent Floyd's cycle finding algorithm (where to find the cycle two persons starting at same point runs with first person twice as compared to the second person and when they meet we can say that first person have completed one cycle). Calculate GCD of |x-y| and n. If GCD is neither unity nor number n itself we can say that GCD is our final answer otherwise we take another set of values for x,y, and c and repeat this procedure once again till we do not get the divisor.
Let us understand with the given example:

Example:
Suppose we have f(x)=x^2+a and n=91
Start with random set of values as x1=y1=2 and a=1 so f(x)=x^2+1

x2=f(x1)=(2^2+1)%91=5
y2=(f(f(y1))=f(5)=(5^2+1)%91=26
and so on...

Now we will find |x-y|
|x2-y2|=21
and so on...

Now calculate gcd(|x-y|,n)
gcd(|x2-y2-y|,91)=gcd(21,91)=7 (non trivial factor)
as we can see that we have got a non trivial solution so we can say that 7 is a factor of 91.

We can also make it more readable by this way:

x(i+1)=f(xi)      y{x(i+1)=f(f(xi))}   |x-y|     gcd(|x-y|,n)
5                 26                    21           7

Now after having so much of reading let's move straight towards the coding part where we apply what we have discussed above and try to implement that in our code.

C++ implementation:

#include<bits/stdc++.h>
using namespace std;
typedef long long int ll;

//function to find the power of two numbers
ll power(ll b, int e, ll m)
{
	//initialize answer
	ll answer = 1;

	while (e > 0)
	{
		//if n is odd, multiply b with answer
		if (e % 2 == 1)
			answer = (answer * b) % m;

		e = e/2;
		b = (b * b) % m;
	}
	return answer;
}

//method to return prime divisor for n
ll pollRh(ll n)
{
	//initialize random seed
	srand (time(NULL));

	//no prime divisor for 1
	if (n==1) return n;

	//even number means one of the divisors is 2
	if (n % 2 == 0) return 2;

	//we will pick from the range [2, N)
	ll x = (rand()%(n-2))+2;
	ll y = x;

	//the constant in f(x).
	//Algorithm can be re-run with a different a
	//if it throws failure for a composite.
	ll a = (rand()%(n-1))+1;

	ll d = 1;

	//until the prime factor isn't obtained.
	//If n is prime, return n
	while (d==1)
	{
		//second person move x(i+1)=fx(i))
		x = (power(x, 2, n) + a + n)%n;

		//first person move y(i+1) = f(f(y(i)))
		y = (power(y, 2, n) + a + n)%n;
		y = (power(y, 2, n) + a + n)%n;

		//check gcd of |x-y| and n
		d = __gcd(abs(x-y), n);

		// retry if the algorithm fails to find prime factor with chosen x and a
		if (d==n) return pollRh(n);
	}

	return d;
}


int main(void)
{
	ll n = 91;
	cout<<"One of the divisor is"<<" "<<pollRh(n);
	return 0;
}

Output:

One of the divisor is 7

Regarding time complexity of this method, it is not sure till now that exactly what is the time complexity but approximately it can be said that it take O(n^(1/4)) iterations, where n is the given number.

So now i want to conclude this topic at OpenGenus with the hope that you all will definitely try this problem on your own atleast once. Happy coding

Thank you.