This article discusses about a special kind of layer called the Dropout layer in TensorFlow (tf.nn.dropout) which is used in Deep Neural Networks as a measure for preventing or correcting the problem of over-fitting. And this process of correcting is a kind of regularization technique called the "Stochastic Regularization".
Table of contents:
- The Problem of Overfitting
- Dropout operation
- Dropout Process
- Dropout in TensorFlow (tf.nn.dropout)
- Applications of Dropout
The Problem of Overfitting:
Let us understand what is overfitting before diving in deep with the implementation of the Dropout layer. Overfitting and Underfitting are the two most common and important problems faced when training our model. And it becomes important to take Bias-Variance Trade Off into consideration in order to achieve good accuracy & balance.
Overfitting is the problem that occurs when the model's learning curve tries to fit over the training data perfectly. It happens mostly because of the presence of noise (unessential information) in the training data and when we are trying to train too much. The model picks up noise and thus performs well on the training data but fails to perform well on the test set or unseen data. Thus these models have low bias and high variance. Usually, complex models like decision trees are prone to this problem.
As mentioned previously, dropout is a special layer that is introduced to add functionality or to perform certain operations. These special layers do not contain any neurons.
Concept of Dropout:
The main idea behind dropout is to narrow down and look for a very specific set of weights that are responsible for making the network learn these noisy patterns.
The main idea here is to drop them randomly. This is done by dropping out a fraction of the input neurons to a particular layer at every training step, say like 50% or 33% depending on the number of neurons. Thus, in this way we are training a variety of different network at each training step, and it very unlikely that the same neurons would be dropped at any two consecutive steps. Therefore, finally what we get is an average of all the different neurons connection combinations that were obtained at each training step.
Dropping out these neurons makes the network learn from the general or broad patterns found in the data. Thus, we're making the nodes more independent.
It is observed that using dropout in the last few layers (in the fully connected layers) has helped in improving the error rate. It is also a common practice to use them in the last few layers. It can be implemented on any of the hidden layers as well as the input layer but not on the output layer.
Secondary thing to note is that dropout is used only during the training phase and isn't utilized during the evaluation or testing phase; in other words we treat it as a normal neural network without dropout.
To implement this neuron de-activation,
- A dropout mask is generated (zeroes and ones) during the process of forward propagation. And this is used only during the training.
- This mask is applied to the preceding layer output or to the inputs to the next layer
- Weights are multipled to the output i.e obtained after the mask is applied, additionally a bias is added.
- Finally this is passed on to an activation function.
All these weights are shared across all the different combinations of networks. During backpropagation only the weights of the thinned network or only those neurons which were activated in the forward propagation are selected. The output obtained after the application of mask in the forward propagation is stored and used as a cache for the backpropagation.
Dropout in TensorFlow (tf.nn.dropout)
Coming to the implementation in TensorFlow, the general idea is to downgrade the weights at the time of testing.
Suppose the weights are
[1,2,3,4,5,6,7,8] and the dropout hyperparameter p is set to 0.75 i.e to drop three-fourths of the neurons randomly, we would have
[1,0,3,0,0,6,7,0] and during the test we'd be multiplying with p i.e
But in TensorFlow, we follow a slightly different process, instead of downscaling at testing we upscale (rescale) them and set rest of the elements to zero during the training by multiplying with the inverse of p i.e 1/0.75 which would give us a matrix of
[1.33,0,4,0,0,8,9.33,0] but at the test time we would do
1/1*[1,2,3,4,5,6,7,8], so we're treating it as a normal neural network.
We're upscaling to preserve the total sum, i.e
[1,2,3,4,5,6,7,8] = 36 [1.33,0,4,0,0,8,9.33,0] = 23
The idea is to keep the output unchanged approximately, regardless of the techniques we're using.
Writing the code in Python:
We use the dropout() function or method from the tf.nn module to create a dropout layer in our TensorFlow model.
It accepts the following arguments
tf.nn.dropout( x, rate, noise_shape=None, seed=None, name=None )
x is the input which will be upscaled by
1 / keep_prob, otherwise
0 is outputted.
keep_prob keeps the specified fraction of weights and sets rest of them to zeroes.
import numpy as np import tensorflow as tf tf.random.set_seed(0) x = tf.ones([2,5]) tf.nn.dropout(x, rate = 0.8, seed = 1).numpy()
Would give us an output as,
array([[0., 0., 0., 5., 5.], [0., 5., 0., 5., 0.]], dtype=float32)
Applications of Dropout:
In 2014 paper of “Dropout: A Simple Way to Prevent Neural Networks from Overfitting” dropout was used on a wide range of computer vision, speech recognition, and text classification tasks and it was found to consistently improve performance on each problem.
With this article at OpenGenus, you must have the complete idea of dropout operation in TensorFlow.