GELU vs ReLU

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article, we have explored the differences between GELU (Gaussian Error Linear Unit) and ReLU (Rectified Linear Unit) activation functions in depth.

Following table demonstrates the difference between GELU and ReLU both of which are popular activation functions:


Point	GELU	ReLU
Full Form	Gaussian Error Linear Unit	Rectified Linear Unit
Note	non-convex, non-monotonic function is not linear and has curvature at all locations	convex and monotonic activation and is linear in positive range with no curvature.
Non-linearity	Entire positive range, Small part in Negative range	Only in positive range
Calculation	Approximation algorithms used for performance.	Exact calculation done.
Abrupt change	No abrupt change.	Abrupt change at point 0 that is the border of positive and negative range.
Use-case	NLP models such as BERT	CNN models like ResNet50
Problem?	No known problem; Solves Dying ReLU problem	Dying ReLU problem
Developed in	2015	2010
Developed by	Dan Hendrycks and Kevin Gimpel from UC Berkeley and Toyota Technological Institute at Chicago	Vinod Nair and Geoffrey Hinton from University of Toronto
Accuracy	Better than ReLU by nearly 2% in median.	Slightly lower than GELU but equivalent in some cases like in CNN models.

With this article at OpenGenus, you must have the complete idea of the two important activation functions GELU and ReLU.

GELU vs ReLU

Machine Learning (ML) Deep Learning

System Design for Parking lot

System design of distributed cache