×

Search anything:

GELU vs ReLU

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we have explored the differences between GELU (Gaussian Error Linear Unit) and ReLU (Rectified Linear Unit) activation functions in depth.

Following table demonstrates the difference between GELU and ReLU both of which are popular activation functions:

PointGELUReLU
Full FormGaussian Error Linear UnitRectified Linear Unit
Notenon-convex, non-monotonic function is not linear and has curvature at all locationsconvex and monotonic activation and is linear in positive range with no curvature.
Non-linearityEntire positive range, Small part in Negative rangeOnly in positive range
CalculationApproximation algorithms used for performance.Exact calculation done.
Abrupt changeNo abrupt change.Abrupt change at point 0 that is the border of positive and negative range.
Use-caseNLP models such as BERTCNN models like ResNet50
Problem?No known problem; Solves Dying ReLU problemDying ReLU problem
Developed in20152010
Developed byDan Hendrycks and Kevin Gimpel from UC Berkeley and Toyota Technological Institute at ChicagoVinod Nair and Geoffrey Hinton from University of Toronto
AccuracyBetter than ReLU by nearly 2% in median.Slightly lower than GELU but equivalent in some cases like in CNN models.

With this article at OpenGenus, you must have the complete idea of the two important activation functions GELU and ReLU.

Jonathan Buss

Jonathan Buss

Associate Professor at University of Waterloo | BSc in Computing from California Institute of Technology, PhD from Massachusetts Institute of Technology (MIT)

Read More

Improved & Reviewed by:


OpenGenus Foundation OpenGenus Foundation
GELU vs ReLU
Share this