Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
Convolution is the most important operation in Machine Learning models where more than 70% of computational time is spent. The input data has specific dimensions and we can use the values to calculate the size of the output.
In short, the answer is as follows:
-
Output height = (Input height + padding height top + padding height bottom - kernel height) / (stride height) + 1
-
Output width = (Output width + padding width right + padding width left - kernel width) / (stride width) + 1
The concerned parameters taken by Convolution are:
- Input dimensions: height, width, batch size and number of channels
- Kernel height and Kernel width
- Stride height and stride width
- Padding height top, Padding height bottom
- Padding width right, Padding width left
We will understand each of the parameters and then, understand how it can be used to calculate the size of the output.
Input dimension
Ideally, the input in Machine Learning models are 4 dimensions (4D) where the dimensions are:
- Height (H)
- Width (W)
- Number of batches (N)
- Number of channels (C)
Height and width of the input is the height and width of the input images.
Number of batches (N) or the batch size is the number of 2D images processed together or passed into the Machine Learning model. When batch size is 1, the inference process is known as latency and when the batch size is greater than 1, it is known as throughput.
Common batch size (N) is 1 and 256.
Number of channels (C) is the number of channels in the input image. Images are defined by 3 dimensions: height, width and channels. Channels determine the features in the images. In colored images, it is usually 3 for RGB or HSV. This parameter can be of any dimension.
Note that the number of batches (N) and number of channels (C) is same for input and output as the parameters that transform the input in Convolution works on the 2D input and keeps the other dimensions preserved.
Kernel
The kernel is ideally of 2 dimensions: height (KH) and width (KW). In some cases, it has a third dimension: channels (KC).
The 2D kernel applies on each 2D images. Each C (third dimension) 2D kernels are applied on each 2D image. For each kernel, all C dimensions of the inputs are merged and the final output has KC dimensions for the channel dimension.
Stride
Stride is the value by which the kernel slides over the input data. By default, it is 1. Strides are of two types:
- Stride along width (SW)
- Stride along height (SH)
Common stride values are 2 and 3.
Padding
Padding is the amount of default data added to the sides of the input to maintain the size of the output. The values in the padding area is usually 0. There are four types of padding:
- Padding along height on top (P_H1)
- Padding along height on bottom (P_H2)
- Padding along width on left (P_W1)
- Padding along width on right (P_W2)
This modifies the dimensions of the input data as follows:
- Height: Height + Padding along height on top + Padding along height on bottom = H + P_H1 + P_H2
- Width: Width + Padding along width on left + Padding along width on right = W + P_W1 + P_W2
Calculation of output size
Input dimensions:
- Height (H)
- Width (W)
- Number of batches (N)
- Number of channels (C)
- Kernel height (KH)
- Kernel width (KW)
- Kernel channels (KC)
- Stride height (SH)
- Stride width (SW)
- Padding: P_H1, P_H2, P_W1, P_W2
The input image dimension: N x H x W x C
The modified height and width of input due to padding is as follows:
- Height modified H1 = H + P_H1 + P_H2
- Width modified W1 = W + P_W1 + P_W2
On applying the kernel (KH x KW), the modified height and width will be as follows:
- H2 = H1 - KH + 1 = H + P_H1 + P_H2 - KH + 1
- W2 = W1 - KW + 1 = W + P_W1 + P_W2 - KW + 1
The kernel channel (KC) modifies the output channel to KC as for each kernel channel, all input channels are added together.
On considering strides, we get the following height and width:
- H3 = (H1 - KH) / SH + 1
- W3 = (W1 - KW) / SW + 1
This makes the dimensions of the output as follows:
- Number of batches = N
- Number of channels = KC
- Height = (Input Height + Padding height (top and bottom) - Kernel Height) / Stride Height + 1
- Width = (Input Width + Padding width (left and right) - Kernel Width) / Stride Width + 1
Example
Let us take some example values of input data and calculate the dimension of the output accordingly.
Input: [N H W C] = [100 255 255 64]
Kernel: [H W C] = [7 128]
Stride: [Height Width] = [3 4]
Padding: [P_H1, P_H2, P_W1, P_W2] = [2 2 2 2]
Output Height = (255 + 2 + 2 - 7) / 3 + 1 = 85
Output Width = (255 + 2 + 2 - 7) / 4 + 1 = 64
Output channels = 128
Output batch size = 100
Hence, the output size is: [N H W C] = 100 x 85 x 64 x 128
With this article at OpenGenus, you must have the complete idea of computing the output size of convolution. Enjoy.
Learn more:
- Convolution Layer by Surya Pratap Singh at OpenGenus
- Convolutional Neural Network (CNN) questions by Leandro Baruch at OpenGenus
- Building a Convolution Neural Network (CNN) for handwritten digit recognition in Python using Keras by Piyush Mishra at OpenGenus
- Convolutional Neural Networks (CNN) by Piyush Mishra and Junaid N Z at OpenGenus
- Machine Learning topics at OpenGenus