In this article, we have explored the differences between GELU (Gaussian Error Linear Unit) and ReLU (Rectified Linear Unit) activation functions in depth.
Following table demonstrates the difference between GELU and ReLU both of which are popular activation functions:
|Full Form||Gaussian Error Linear Unit||Rectified Linear Unit|
|Note||non-convex, non-monotonic function is not linear and has curvature at all locations||convex and monotonic activation and is linear in positive range with no curvature.|
|Non-linearity||Entire positive range, Small part in Negative range||Only in positive range|
|Calculation||Approximation algorithms used for performance.||Exact calculation done.|
|Abrupt change||No abrupt change.||Abrupt change at point 0 that is the border of positive and negative range.|
|Use-case||NLP models such as BERT||CNN models like ResNet50|
|Problem?||No known problem; Solves Dying ReLU problem||Dying ReLU problem|
|Developed by||Dan Hendrycks and Kevin Gimpel from UC Berkeley and Toyota Technological Institute at Chicago||Vinod Nair and Geoffrey Hinton from University of Toronto|
|Accuracy||Better than ReLU by nearly 2% in median.||Slightly lower than GELU but equivalent in some cases like in CNN models.|
With this article at OpenGenus, you must have the complete idea of the two important activation functions GELU and ReLU.