Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Convolution is the most important operation in Machine Learning models where more than 70% of computational time is spent. The input data has specific dimensions and we can use the values to calculate the size of the output.

In short, the answer is as follows:

Output height = (Input height + padding height top + padding height bottom - kernel height) / (stride height) + 1
Output width = (Output width + padding width right + padding width left - kernel width) / (stride width) + 1

The concerned parameters taken by Convolution are:

Input dimensions: height, width, batch size and number of channels
Kernel height and Kernel width
Stride height and stride width
Padding height top, Padding height bottom
Padding width right, Padding width left

We will understand each of the parameters and then, understand how it can be used to calculate the size of the output.

Input dimension

Ideally, the input in Machine Learning models are 4 dimensions (4D) where the dimensions are:

Height (H)
Width (W)
Number of batches (N)
Number of channels (C)

Height and width of the input is the height and width of the input images.

Number of batches (N) or the batch size is the number of 2D images processed together or passed into the Machine Learning model. When batch size is 1, the inference process is known as latency and when the batch size is greater than 1, it is known as throughput.

Common batch size (N) is 1 and 256.

Number of channels (C) is the number of channels in the input image. Images are defined by 3 dimensions: height, width and channels. Channels determine the features in the images. In colored images, it is usually 3 for RGB or HSV. This parameter can be of any dimension.

Note that the number of batches (N) and number of channels (C) is same for input and output as the parameters that transform the input in Convolution works on the 2D input and keeps the other dimensions preserved.

Kernel

The kernel is ideally of 2 dimensions: height (KH) and width (KW). In some cases, it has a third dimension: channels (KC).

The 2D kernel applies on each 2D images. Each C (third dimension) 2D kernels are applied on each 2D image. For each kernel, all C dimensions of the inputs are merged and the final output has KC dimensions for the channel dimension.

Stride

Stride is the value by which the kernel slides over the input data. By default, it is 1. Strides are of two types:

Stride along width (SW)
Stride along height (SH)

Common stride values are 2 and 3.

Padding

Padding is the amount of default data added to the sides of the input to maintain the size of the output. The values in the padding area is usually 0. There are four types of padding:

Padding along height on top (P_H1)
Padding along height on bottom (P_H2)
Padding along width on left (P_W1)
Padding along width on right (P_W2)

This modifies the dimensions of the input data as follows:

Height: Height + Padding along height on top + Padding along height on bottom = H + P_H1 + P_H2
Width: Width + Padding along width on left + Padding along width on right = W + P_W1 + P_W2

Calculation of output size

Input dimensions:

Height (H)
Width (W)
Number of batches (N)
Number of channels (C)
Kernel height (KH)
Kernel width (KW)
Kernel channels (KC)
Stride height (SH)
Stride width (SW)
Padding: P_H1, P_H2, P_W1, P_W2

The input image dimension: N x H x W x C

The modified height and width of input due to padding is as follows:

Height modified H1 = H + P_H1 + P_H2
Width modified W1 = W + P_W1 + P_W2

On applying the kernel (KH x KW), the modified height and width will be as follows:

H2 = H1 - KH + 1 = H + P_H1 + P_H2 - KH + 1
W2 = W1 - KW + 1 = W + P_W1 + P_W2 - KW + 1

The kernel channel (KC) modifies the output channel to KC as for each kernel channel, all input channels are added together.

On considering strides, we get the following height and width:

H3 = (H1 - KH) / SH + 1
W3 = (W1 - KW) / SW + 1

This makes the dimensions of the output as follows:

Number of batches = N
Number of channels = KC
Height = (Input Height + Padding height (top and bottom) - Kernel Height) / Stride Height + 1
Width = (Input Width + Padding width (left and right) - Kernel Width) / Stride Width + 1

Example

Let us take some example values of input data and calculate the dimension of the output accordingly.

Input: [N H W C] = [100 255 255 64]
Kernel: [H W C] = [7 128]
Stride: [Height Width] = [3 4]
Padding: [P_H1, P_H2, P_W1, P_W2] = [2 2 2 2]

Output Height = (255 + 2 + 2 - 7) / 3 + 1 = 85
Output Width = (255 + 2 + 2 - 7) / 4 + 1 = 64
Output channels = 128
Output batch size = 100

Hence, the output size is: [N H W C] = 100 x 85 x 64 x 128

With this article at OpenGenus, you must have the complete idea of computing the output size of convolution. Enjoy.

Learn more:

Convolution Layer by Surya Pratap Singh at OpenGenus
Convolutional Neural Network (CNN) questions by Leandro Baruch at OpenGenus
Building a Convolution Neural Network (CNN) for handwritten digit recognition in Python using Keras by Piyush Mishra at OpenGenus
Convolutional Neural Networks (CNN) by Piyush Mishra and Junaid N Z at OpenGenus
Machine Learning topics at OpenGenus

Calculate output size of Convolution

Machine Learning (ML) cnn