Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Table of contents:

Introduction to Unpooling and Pooling
Illustration of unpooling
Different types of unpooling operations
- Nearest neighbor unpooling
- Max unpooling
- Fractional max unpooling
Transposed convolutional layers
Advantages and disadvantages of unpooling
Pseudocode of unpooling
Conclusion

Introduction to Unpooling and Pooling

In machine learning models, the pooling operation that was previously carried out in the network is reversed using the unpooling operation. Pooling is a method for downsampling an image or feature map's spatial dimensions while preserving the crucial data. After pooling has downscaled the feature maps spatial dimensions, unpooling is used to restore them to their original size.

In different types of neural networks, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), unpooling can be used to retain crucial information that might have been lost during the pooling operation, thereby enhancing the model's accuracy.Unpooling can assist in preserving the spatial information of the initial input in addition to reversing the pooling operation. This is crucial for tasks like image segmentation, where the network's output must match the input image's spatial dimensions.

Depending on the particular task at hand and the network architecture in use, one may choose an unpooling operation. The quickest and easiest method is nearest neighbour unpooling, but it runs the risk of losing crucial spatial data. Max unpooling can preserve more data, but if the pooling operation had a large stride or kernel size, it might produce blocky artifacts. Although it may be computationally expensive, fractional max unpooling gives the unpooling process more flexibility and can produce output that is smoother.

Illustration of unpooling

Here's an example of max unpooling using a simple 2x2 max pooling operation.

Suppose we have the following 2x2 input feature map:

[[1, 2],
 [3, 4]]

We apply a 2x2 max pooling operation with a stride of 2 and obtain the following pooled feature map:

[[4]]

To perform unpooling, we need to remember the locations of the maximum values in the original input feature map. In this case, the maximum value was 4, which was located at position (1, 1) in the original input feature map.

To perform max unpooling, we create a new 2x2 output feature map with all zeros except for a single 4 at position (1, 1):

[[0, 0],
 [0, 4]]

We then perform the inverse of the max pooling operation by inserting the maximum value at the remembered location and filling the other positions with zeros:

[[0, 0],
 [0, 4]] --> [[0, 0, 0, 0],
               [0, 4, 0, 0],
               [0, 0, 0, 0],
               [0, 0, 0, 0]]

This results in a 4x4 output feature map that has the same shape as the original input feature map but with a single non-zero value corresponding to the maximum value in the pooled feature map.

Different types of unpooling operations

The 3 Different types of unpooling operations are:

Nearest neighbor unpooling
Max unpooling
Fractional max unpooling

We will dive into each type in depth.

Nearest neighbor unpooling

By replicating the values from the pooled regions to the corresponding locations in the output feature map, nearest neighbour unpooling is a technique for unpooling used in machine learning models to undo the effects of pooling.

import numpy as np

def nearest_neighbor_unpooling(pool, pool_argmax, scale_factor):
    # pool: the output of the max pooling operation
    # pool_argmax: the argmax of the pooling operation, used to keep track of the location of the max value
    # scale_factor: the factor by which to upsample the pooled feature map

    # Get the shape of the pooled feature map and the unpooled feature map
    b, c, ph, pw = pool.shape
    upsampled_shape = (b, c, ph * scale_factor, pw * scale_factor)

    # Create an empty upsampled feature map
    upsampled = np.zeros(upsampled_shape)

    # Iterate over the indices of the upsampled feature map
    for i in range(ph * scale_factor):
        for j in range(pw * scale_factor):
            # Compute the index of the corresponding pooling unit
            pool_i = i // scale_factor
            pool_j = j // scale_factor

            # Copy the value of the pooling unit to the corresponding location in the upsampled feature map
            upsampled[:, :, i, j] = pool[:, :, pool_i, pool_j]

    return upsampled

The input image or feature map is split into non-overlapping regions for the pooling operation, and the highest or average value from each region is used to create a smaller output image or feature map. By replicating the values from the pooled regions to the corresponding locations in the output feature map, the pooled regions are expanded back to their original size during nearest neighbour unpooling. In neural network architectures, nearest neighbour unpooling is simple to implement and computationally effective. However, if the pooling operation had a large stride or kernel size, it could result in the loss of crucial spatial information and blocky artifacts in the output.

For instance, nearest neighbour unpooling would duplicate each value from the pooled region to its corresponding location in an 8x8 output feature map if the pooling operation reduced the input feature map's size from 8x8 to 4x4.

Max unpooling

Max unpooling is a technique for unpooling used in machine learning models to undo the effects of max pooling by reconstructing the original feature map using the positions of the maximum values from the pooling operation.

import numpy as np

def max_unpooling(pool, pool_argmax, stride):
    # pool: the output of the max pooling operation
    # pool_argmax: the argmax of the pooling operation, used to keep track of the location of the max value
    # stride: the stride used in the max pooling operation

    # Get the shape of the pooled feature map and the unpooled feature map
    b, c, ph, pw = pool.shape
    unpooled_shape = (b, c, ph * stride, pw * stride)

    # Create an empty unpooled feature map and a mask to keep track of where the max values are
    unpooled = np.zeros(unpooled_shape)
    mask = np.zeros(pool.shape)

    # Iterate over the indices of the pooled feature map
    for i in range(ph):
        for j in range(pw):
            # Compute the index of the corresponding unpooled feature map unit
            unpooled_i = i * stride
            unpooled_j = j * stride

            # Retrieve the index of the max value in the pool and the corresponding value
            index = pool_argmax[:, :, i, j]
            value = pool[:, :, i, j]

            # Copy the value to the corresponding location in the unpooled feature map
            unpooled[:, :, unpooled_i:unpooled_i+stride, unpooled_j:unpooled_j+stride] = value.reshape(b, c, 1, 1)

            # Set the corresponding mask value to 1
            mask[:, :, i, j] = 1

    # Multiply the mask by the unpooled feature map to keep only the max values
    unpooled *= mask

    return unpooled

Max pooling reduces the size of the output image or feature map by taking the maximum value from each non-overlapping region of the input image or feature map. During max unpooling, the indices of the maximum values from the pooling operation are saved and used to assign the value of each index to the appropriate location in the output feature map in order to reconstruct the original feature map. If the pooling operation had a large stride or kernel size, max unpooling may produce blocky artifacts in the output but preserves more spatial information than nearest neighbour unpooling. Fractional max unpooling, which offers more flexibility in the unpooling procedure by using fractional indices to reconstruct the original feature map, can be used to get around this problem.

For instance, max unpooling would use the locations of the maximum values from the pooling operation to recreate the original 8x8 feature map if the max pooling operation reduced the input feature map from a size of 8x8 to 4x4.

Fractional max unpooling

Fractional max unpooling is a technique for unpooling used in machine learning models to undo the effects of max pooling by reconstructing the original feature map using fractional indices.

Max pooling reduces the size of the output image or feature map by taking the maximum value from each non-overlapping region of the input image or feature map. In fractional max unpooling, the maximum values from the pooling operation are stored in fractional indices, which are then used to reconstruct the original feature map.

import numpy as np

def fractional_max_unpooling(pool, pool_argmax, scale_factor):
    # pool: the output of the max pooling operation
    # pool_argmax: the argmax of the pooling operation, used to keep track of the location of the max value
    # scale_factor: the factor by which to upsample the pooled feature map

    # Get the shape of the pooled feature map and the unpooled feature map
    b, c, ph, pw = pool.shape
    upsampled_shape = (b, c, ph * scale_factor, pw * scale_factor)

    # Create an empty upsampled feature map
    upsampled = np.zeros(upsampled_shape)

    # Iterate over the indices of the upsampled feature map
    for i in range(ph * scale_factor):
        for j in range(pw * scale_factor):
            # Compute the indices of the corresponding pooling units
            pool_i = i / scale_factor
            pool_j = j / scale_factor
            lower_i = int(pool_i)
            lower_j = int(pool_j)
            upper_i = min(lower_i + 1, ph - 1)
            upper_j = min(lower_j + 1, pw - 1)

            # Compute the fractional parts of the indices
            alpha = pool_i - lower_i
            beta = pool_j - lower_j

            # Retrieve the values and indices of the four closest pooling units
            values = np.zeros((b, c, 4))
            indices = np.zeros((b, c, 4))
            values[:, :, 0] = pool[:, :, lower_i, lower_j]
            values[:, :, 1] = pool[:, :, lower_i, upper_j]
            values[:, :, 2] = pool[:, :, upper_i, lower_j]
            values[:, :, 3] = pool[:, :, upper_i, upper_j]
            indices[:, :, 0] = pool_argmax[:, :, lower_i, lower_j]
            indices[:, :, 1] = pool_argmax[:, :, lower_i, upper_j]
            indices[:, :, 2] = pool_argmax[:, :, upper_i, lower_j]
            indices[:, :, 3] = pool_argmax[:, :, upper_i, upper_j]

            # Compute the interpolated value for each channel and batch
            for b_idx in range(b):
                for c_idx in range(c):
                    upsampled[b_idx, c_idx, i, j] = (
                        (1 - alpha) * (1 - beta) * values[b_idx, c_idx, 0]
                        + (1 - alpha) * beta * values[b_idx, c_idx, 1]
                        + alpha * (1 - beta) * values[b_idx, c_idx, 2]
                        + alpha * beta * values[b_idx, c_idx, 3]
                    )

    return upsampled

When compared to max unpooling, fractional max unpooling can produce output that is more consistent and flexible. It reconstructs the original feature map by interpolating the values of the grid points around the centre of a square grid, which is formed using the indices of the maximum values. Fractional max unpooling can produce output of higher quality but is computationally more expensive than nearest neighbour and max unpooling. Convolutional neural networks frequently use it for image segmentation and object detection tasks.

For instance, fractional max unpooling would use the indices of the maximum values to create a 4x4 grid and interpolate the values of the nearby grid points to recreate the original 8x8 feature map if the max pooling operation reduced the input feature map's size from 8x8 to 4x4.

Transposed convolutional layers

For upsampling or boosting the spatial resolution of feature maps, machine learning models employ transposed convolutional layers, also referred to as deconvolutional layers.

import numpy as np

def transposed_conv2d(x, weight, bias=None, stride=1, padding=0, output_padding=0):
    # x: input tensor of shape (batch_size, in_channels, height, width)
    # weight: weight tensor of shape (in_channels, out_channels, kernel_size, kernel_size)
    # bias: optional bias tensor of shape (out_channels,)
    # stride: stride for the convolution operation
    # padding: padding to be applied to the input
    # output_padding: additional padding to be added to the output

    batch_size, in_channels, in_height, in_width = x.shape
    in_padding = padding

    # Calculate output shape
    kernel_size = weight.shape[2]
    out_channels = weight.shape[1]
    out_height = (in_height - 1) * stride + kernel_size + output_padding - 2 * in_padding
    out_width = (in_width - 1) * stride + kernel_size + output_padding - 2 * in_padding
    out_shape = (batch_size, out_channels, out_height, out_width)

    # Initialize output tensor
    out = np.zeros(out_shape)

    # Add padding to input tensor
    if in_padding > 0:
        x = np.pad(x, [(0, 0), (0, 0), (in_padding, in_padding), (in_padding, in_padding)], mode='constant')

    # Transpose weight tensor
    weight = np.transpose(weight, (1, 0, 2, 3))

    # Perform transposed convolution
    for b in range(batch_size):
        for i in range(out_channels):
            for j in range(in_channels):
                out[b, i] += np.rot90(np.convolve(np.rot90(x[b, j], 2), weight[i, j], mode='full'), 2)[::stride, ::stride]

            # Add bias
            if bias is not None:
                out[b, i] += bias[i]

    return out

Convolutional layers combine a set of learnable filters with the input image or feature map to create a smaller output feature map. Transposed convolutional layers increase the spatial resolution of the feature map by convolving it with a set of filters. Convolutional layers that have been reversed can be thought of as transposed convolutional layers. They map outputs to inputs rather than inputs to outputs. In other words, they learn to carry out a convolutional layer's inverse operation.

Advantages and disadvantages of unpooling

Advantages:
1.Spatial data preservation: Unpooling aids in the preservation of spatial data that may have been lost during the pooling process. This is crucial for tasks like image segmentation, where the network's output must match the input image's spatial dimensions.

2.Increasing the size of the feature maps: Unpooling increases the size of the feature maps, which can help to improve the model's accuracy by giving later layers more detailed data to work with.

3.Reducing overfitting: By giving the network more chances to learn from the data, unpooling can help to reduce overfitting in the model. This is because the unpooling operation increases the model's number of trainable parameters, which can be used to modify the feature maps to more closely match the input data.

4.Computing effectively: Unpooling is computationally effective and simple to implement in neural network architectures thanks to the use of straightforward operations like nearest neighbour replication.

Disdvantages:
1.Increased computational complexity: Unpooling increases the number of trainable parameters in the model, which raises the network's computational complexity and training time requirements.

2.Information loss possibility: Depending on the chosen unpooling technique, there may be some data lost during the unpooling process. For instance, nearest neighbour unpooling may result in output with blocky artifacts.

3.Potential for overfitting: If the additional trainable parameters are not properly regularized or if the unpooling operation is carried out too aggressively, unpooling may increase the risk of overfitting in the model.

4.Increased memory requirements: An increase in the model's memory requirements is caused by the need to store the indices or pooling masks used during the pooling operation.

Pseudocode of unpooling

Here, We will take example of max unpooling :

function max_unpooling(pooled_feature_map, pooling_mask, stride):
    # pooled_feature_map: the output of the max pooling operation
    # pooling_mask: a binary mask indicating the locations of the max values in the original input feature map
    # stride: the stride used in the max pooling operation
    
    # determine the size of the original input feature map
    batch_size, channels, pooled_height, pooled_width = pooled_feature_map.shape
    _, _, unpooled_height, unpooled_width = pooling_mask.shape
    
    # create an empty array to hold the unpooled feature map
    unpooled_feature_map = np.zeros((batch_size, channels, unpooled_height, unpooled_width))
    
    # iterate over the batches, channels, and pooled feature map
    for b in range(batch_size):
        for c in range(channels):
            for ph in range(pooled_height):
                for pw in range(pooled_width):
                    # retrieve the value and location of the max value in the pooled feature map
                    max_value = pooled_feature_map[b, c, ph, pw]
                    max_index = pooling_mask[b, c, ph, pw]
                    
                    # convert the 1D index to 2D indices in the unpooled feature map
                    h_index = (max_index // unpooled_width) * stride + ph
                    w_index = (max_index % unpooled_width) * stride + pw
                    
                    # set the value of the unpooled feature map at the appropriate location
                    unpooled_feature_map[b, c, h_index, w_index] = max_value
                    
    return unpooled_feature_map

Conclusion

In conclusion of this article at OpenGenus, unpooling operations play a crucial role in machine learning for recovering lost spatial information. For tasks like image recognition, segmentation, and object detection, where it is necessary to precisely localize and classify objects in an image, they are frequently used in neural network architectures. Unpooling operations come in a variety of forms, such as nearest neighbour, maximum, and fractional maximum. The choice of which to use depends on the particular task and network architecture. Each type has benefits and drawbacks. Unpooling operations have been successfully used in numerous real-world applications despite some potential drawbacks, such as the potential to produce artifacts or increase output noise. By increasing the spatial resolution of feature maps and retaining spatial information that would otherwise be lost, they can enhance the accuracy of downstream tasks.

In general, unpooling operations are a useful tool for enhancing the performance of a variety of tasks in neural network and machine learning architectures. As researchers work to optimize and enhance the effectiveness and precision of machine learning models, they are probably going to continue to be a significant area of research in the field.

Unpooling operations in ML models

Machine Learning (ML) Deep Learning