
Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
We shall be discussing Gradient Ascent.
Table of Contents:
- Gradient Ascent (Concept Explanation)
- Application of Gradient Ascent
- Differences with Gradient Descent
Gradient Ascent (Concept Explanation)
Gradient Ascent as a concept transcends machine learning. It is the reverse of Gradient Descent, another common concept used in machine learning. Gradient Ascent (resp. Descent) is an iterative optimization algorithm used for finding a local maximum (resp. minimum) of a function. Taking repeated steps in the direction of the gradient of the function at the current point, which is the direction of steepest ascent, invariably leads to a local maximum of the function.
A common analogy that defines the intuition behind the application of gradient ascent algorithm to get to the local maximum is that of a hiker trying to get to the top of a mountain on a foggy day when visibility is very low. This hiker can only make his way to the mountain top by 'feeling' his path forward. A logical process the hiker can follow is to use his feet to feel the way in all directions from his current spot and determine which one has the steepest slope leading upwards and then take a small step in that direction. The hiker stops again and repeats the same thing. If this process is repeated after every step, it is guaranteed to lead the hiker to at least a local peak of the mountain. In case of convex functions which are encountered in mathematics, this is guaranteed to lead to a global maximum. The hiker's path to the top may look something like the image below:
In mathematical terms, the direction of greatest increase in slope of a function, f, is given by the gradient of that function, which is represented as
where
When the direction of steepest accent is known, the next thing is to take a step in that direction which mathematically translates to:
This process is repeated iteratively.
Application of Gradient Ascent
Gradient ascent finds use in Logistic Regression. Logistic regression is a supervised Machine Learning algorithm that is commonly used for prediction and classification problems. It estimates the probability of an event occurring based on a given dataset of independent variables by computing the log-odds for the event as a linear combination of one or more independent variables. The standard logistic function
The aim is to estimate the best values for the beta coefficients that maximize the model parameter called the maximum likelihood estimation (MLE). This is where the use gradient ascent method comes into play and can help determine the best fit values that will maximize the log likelihood function.
Sometimes, the negative of this function is used as the cost function and in that case, the aim will be to minimize this 'loss' and the algorithm to use in this case will be gradient descent.
The cost function to be minimized for a binary classification case is given as:
When the optimal coefficients have been found, the conditional probabilities for each observation can be calculated to give a predicted probability. For binary classification, a probability less than 0.5 will predict 0 while a probability greater than 0 will predict 1.
Differences with Gradient Descent
The main difference between gradient ascent and gradient descent is the goal of the optimization. In gradient ascent, the goal is to maximize the function while in gradient descent, the goal is to minimize the function. As a result, while both of them rely on the gradient
and