Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Introduction
Understanding ResNet
i. Key contributions
ii. ResNet verisons
Key components
Prerequisites
Implementation
Conclusion

Introduction

As the number of layers increase in a CNN architecture, the more chances of exploding gradient problem and degradation in performance. In order to avoid this, the residual network was introduced in a research paper published by the Microsoft Research team.

The dataset used in this implementation is the Apple Leaf Disease Dataset from Kaggle, which consists of images of apple leaves categorized into 3 class labels(Healthy, Rust and Scab). This dataset is used to illustrate how ResNet-50 performs on a smaller, practical dataset, demonstrating its effectiveness in real-world scenarios.

ResNet-50, a powerful convolutional neural network architecture, revolutionized deep learning with its innovative residual learning framework. This article will guide you through designing ResNet-50, a popular deep learning library, from scratch using PyTorch. We’ll cover the core concepts, key components, and provide a step-by-step implementation.

Understanding ResNet

The 2015 paper "Deep Residual Learning for Image Recognition," authored by Kaiming He and his team, presents an approach to addressing the difficulties faced in training deep neural networks.

This breakthrough allows networks with hundreds of layers to be trained effectively, setting new benchmarks in accuracy and performance on challenging datasets such as ImageNet, COCO, etc. ResNet's ability to scale efficiently has influenced a wide range of subsequent architectures and research, paving the way for advanced models like DenseNet and EfficientNet.

We will explore some of the key contributions made by this approach in this OpenGenus article.

Key Contributions

1. Residual Blocks:

The core architectural innovation is the use of residual blocks with shortcut connections (or skip connections).
This design helps in maintaining the flow of gradients during backpropagation, making it possible to train networks with hundreds of layers. It also helps to prevent degradation in accuracy with increasing network depth.

2. Residual Learning Framework:

This framework allows the network to learn the residual between the input and the output of each block.
By focusing on learning residuals, the network can more easily capture and optimize features, facilitating the training of much deeper networks without succumbing to common issues like vanishing gradients.

3. Effective Depth of Networks:

Deep networks, such as ResNet-50, can be trained effectively without the issues of degradation.
The ResNet-50 model, with its 50 layers, provides a practical balance between depth and computational efficiency, leading to state-of-the-art performance on the ImageNet benchmark.

4. Architectural Variants:

Several ResNet architectures where proposed with varying depths, including ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152.
Each of these models provides different trade-offs between depth and computational requirements.

5. Innovations in Training Deep Networks:
It makes training networks with hundreds of layers feasible, while achieving superior performance.

ResNet versions

Version	Total Layers	Block Type	Blocks per Stage	Initial Conv Layer	Key Features	Typical Uses
ResNet-18	18	Basic Blocks	[2, 2, 2, 2]	7x7 Convolution, Max Pool	Shallower architecture, simpler and faster to train	Basic models, less demanding tasks
ResNet-34	34	Basic Blocks	[3, 4, 6, 3]	7x7 Convolution, Max Pool	Intermediate depth, improved accuracy over ResNet-18	Complex image recognition tasks
ResNet-50	50	Bottleneck Blocks	[3, 4, 6, 3]	7x7 Convolution, Max Pool	Balanced depth and efficiency, widely used	Several applications, complex object detection
ResNet-101	101	Bottleneck Blocks	[3, 4, 23, 3]	7x7 Convolution, Max Pool	Deeper network, higher accuracy, more computation	Large-scale datasets, precision-focused tasks
ResNet-152	152	Bottleneck Blocks	[3, 8, 36, 3]	7x7 Convolution, Max Pool	Very deep, top accuracy, resource-heavy	High-end image classification, advanced research

Key components

1. ReLU (Rectified Linear Unit): A filter that decides which numbers are useful and which are not. If a number is positive, it’s kept as is. If it’s negative, it’s turned into zero. It helps the model to learn more complex patterns by adding non-linearity to the data.

2. Batch Normalization: Used to adjust the brightness and contrast of the data to make them look consistent. It makes the training process faster and more stable by keeping the data in a good range.

3. Adaptive Average Pooling: Resizes the data so it fits a specific size, no matter how big or small the original size. It summarizes information from a larger area to fit into a fixed-size output and helps ensure that the final features are a consistent size, making it easier to handle and analyze them.

4. Fully Connected Layer: Every output from one layer is connected to every input in the next layer. It helps the network make final decisions based on all the features learned during training.

5. Optimizer: It adjusts the model’s parameters to make it perform better. It helps the model learn by gradually improving its performance.

6. Loss Function: If the model's predictions are off, the loss functions measures the difference between the prediction and the actual target. It provides a way to measure and minimize mistakes, helping the model improve its accuracy.

7. Stride: Stride controls how big the convolution filter is moved accross the input image. It determines how much the picture is shrunk with each step, affecting how much detail is captured.

8. Downsample: Downsampling reduces the size of the data while trying to keep the important information. This is achieved by using convolution layers and strides greater than 1 or pooling layers. It helps in capturing features at different scales.

9. Input channels and Output channels: Input channels are the number of layers at the start, and output channels are the layers you end up with after processing. It controls how the data is transformed and how many different types of features are detected.

10. Feature map and the depth:

When a CNN processes an image, it produces a set of 2D arrays, each representing different features (like edges, textures, or patterns) detected in the image.
The depth of the feature map indicates how many features the network is detecting at that stage. More depth means the network can capture more complex and varied features.

11. Residual Block: A residual block is a shortcut in the neural network that lets the data bypass some layers. It makes learning faster and reducing errors.

12. Block Counts: It tells you how many residual blocks are used in different parts of the network. It determines how deep and complex the network is.

13. Basic Block and Bottleneck:

A basic block is a type of residual block used in ResNet architectures with fewer layers. It includes two convolutional layers with a 3x3 kernel and a shortcut connection.
A more complex residual block is the bottleneck, which consists of three convolutional layers: 1x1, 3x3, and 1x1. This design reduces the number of parameters and computations while maintaining performance.

14. Forward: The forward function is the path that data follows through the network to get processed. It defines how the data moves through the network and gets transformed into predictions.

Prerequisites

Before training the ResNet model, let's prepare the dataset. This section covers how to download the dataset used in this article, and how to organize it into training and testing sets.

For this project, we use the Apple Leaf Disease Dataset. This dataset consists of images of apple leaves categorized into different classes based on the type of disease or whether they are healthy.

The dataset includes the following classes:

Apple Scab: A common disease affecting apple leaves.
Apple Rust: Another disease that can cause significant damage.
Healthy: Images of healthy apple leaves with no visible disease.

Downloading the Dataset

You can download the Apple Leaf Disease Dataset from Kaggle. Here are the steps to download it:

Approach 1 :

Visit the Apple Leaf Disease Dataset page on Kaggle.
Log in to your Kaggle account.
Click on the "Download" button to get the dataset as a zip file.
Unzip the file to a location of your choice.

Approach 2 :
Alternatively, you can use the Kaggle API to download the dataset directly from your command line once you have set up your kaggle account.

kaggle datasets download -d nirmalsankalana/apple-leaf-disease-dataset

Organizing the Data

To prepare the dataset for training and testing, you need to shuffle the images and split them into train and test directories. Here’s a Python script to help you with this task:

1. Directory Creation:
Import the required header files to access the filesystem and create directories. Also import sklearn

import os
import shutil
from sklearn.model_selection import train_test_split

Ensure that the directories for the training and test datasets are created if they do not already exist.

os.makedirs(train_dir, exist_ok=True)
os.makedirs(test_dir, exist_ok=True)

2. List categories:
List all the categories in the source directory. Each category typically represents a different class in the dataset.

categories = os.listdir(source_dir)

3. Split data:

For each category, list all files and then split them into training and testing sets using train_test_split from sklearn.model_selection.
The test_size parameter controls the proportion of the dataset to be used for testing (20% in this case).
random_state is a way to ensure that data splitting (or other random operations) are repeatable and consistent, which is vital for reproducible debugging in machine learning.

files = os.listdir(category_path)
train_files, test_files = train_test_split(files, test_size=test_size, random_state=42)

4. Category Directory Creation:
This creates the original categories in the train and test directories.

train_category_dir = os.path.join(train_dir, category)
test_category_dir = os.path.join(test_dir, category)
os.makedirs(train_category_dir, exist_ok=True)
os.makedirs(test_category_dir, exist_ok=True)

5. Move data:
The files are then copied to their respective train or test categories.

for file in train_files:
    shutil.copy(os.path.join(category_path, file), os.path.join(train_category_dir, file))
for file in test_files:
    shutil.copy(os.path.join(category_path, file), os.path.join(test_category_dir, file))

Example usage

split_data(): This function takes the source directory of your dataset, and splits it into training and testing sets based on the provided split ratio (default is 80% training and 20% testing).
Directories: It creates the train and test directories if they do not already exist.
These steps allow us to have a well-organized dataset ready for training and evaluating the model.

source_directory = 'data/original_images'
train_directory = 'data/train_images'
test_directory = 'data/test_images'

split_data(source_directory, train_directory, test_directory)

Now let's move on to the training and evaluation of the above dataset.

Implementation

Although the ResNet-50 model is designed for larger datasets, we are utilizing a smaller dataset here to showcase how the ResNet-50 architecture functions.

1. Import the necessary header files:
The below header files are required for performing various tasks such as,os which is useful to interact with the operating system to handle the dataset. torch contains the operations required to build the neural networks.

import os
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader

2. Load the dataset:
Define the dataset paths containing the training and testing image datasets. For the current Apple Leaf Disease Dataset from Kaggle, we use 1382 images to train the dataset and 348 images to test it.

train_dir = 'data/apple-leaf-disease-dataset/train_data'
test_dir = 'data/apple-leaf-disease-dataset/test_data'

3. Image transformations:
Resize and augment the data by ramdomly flipping them horizontally and vertically.
Further convert the images to tensors and normalize.

transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

4. Dataset creation and loaders:

Load the training and testing datasets, where each subdirectory is a class label.
Provide batches of data using DataLoader and shuffle it in each epoch.

train_dataset = datasets.ImageFolder(root=train_dir, transform=transform)
test_dataset = datasets.ImageFolder(root=test_dir, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=4)

5. Define the Bottleneck block:

Define the residual block which includes 3 convolutional layers, batch normalization and activation function, ReLU.
the downsample matches the dimensions.

class Bottleneck(nn.Module):
    expansion = 4
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.expansion = 4
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        out = self.conv3(out)
        out = self.bn3(out)
        out += identity
        out = self.relu(out)
        return out

6. Define the ResNet-50 model:

A simple resnet50 model is implemented below, which includes a series of bottleneck blocks organised into 4 layers with different output channels and block counts.
The model uses an adaptive average pooling layer and a fully connected layer for classification.

class ResNet50(nn.Module):
    def __init__(self, num_classes):
        super(ResNet50, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(Bottleneck, 64, 3)
        self.layer2 = self._make_layer(Bottleneck, 128, 4, stride=2)
        self.layer3 = self._make_layer(Bottleneck, 256, 6, stride=2)
        self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * 4, num_classes)

    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * block.expansion),
            )

        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

7. Model initialization :

Set the number of class labels based on the training set.
Initialise the model with the number of classes.

num_classes = len(train_dataset.classes)
model = ResNet50(num_classes=num_classes)

8. Loss function, optimizer and training function :
Define an appropriate loss function, optimizer and training model. Below, we use

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

def train_model(model, train_loader, criterion, optimizer, num_epochs=25):
    model.train()
    for epoch in range(num_epochs):
        running_loss = 0.0
        for inputs, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item() * inputs.size(0)
        
        epoch_loss = running_loss / len(train_loader.dataset)
        print(f"Epoch {epoch + 1}/{num_epochs}, Loss: {epoch_loss:.4f}")

    print("Training complete.")

9. Evaluation and save the model :
Finally, evaluate the performance of a resnet-50 trained model and save the model for further use and deployment.

def evaluate_model(model, test_loader):
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():  # Disable gradient calculation for efficiency
        for inputs, labels in test_loader:  # Iterate over the test dataset
            outputs = model(inputs)  # Forward pass: compute model predictions
            _, predicted = torch.max(outputs.data, 1)  # Get the index of the max log-probability
            total += labels.size(0)  # Count the total number of labels
            correct += (predicted == labels).sum().item()  # Count the number of correct predictions

    accuracy = 100 * correct / total  # Compute accuracy as a percentage
    print(f'Accuracy: {accuracy:.2f}%')  # Print the accuracy
train_model(model, train_loader, criterion, optimizer)
torch.save(model.state_dict(), 'resnet50_apple_leaf_disease.pth')
print("Model saved to 'resnet50_apple_leaf_disease.pth'")

The expected accuracy for the above model is as shown below, and can further be fine-tuned to train on larger datasets.

Epoch 1/25, Loss: 1.2409
Epoch 2/25, Loss: 1.0889
Epoch 3/25, Loss: 1.0745
Epoch 4/25, Loss: 1.0099
Epoch 5/25, Loss: 1.0774
Epoch 6/25, Loss: 1.0102
Epoch 7/25, Loss: 0.9851
Epoch 8/25, Loss: 0.9157
Epoch 9/25, Loss: 0.9604
Epoch 10/25, Loss: 0.8618
Epoch 11/25, Loss: 0.7833
Epoch 12/25, Loss: 0.5814
Epoch 13/25, Loss: 0.4057
Epoch 14/25, Loss: 0.4237
Epoch 15/25, Loss: 0.3329
Epoch 16/25, Loss: 0.3577
Epoch 17/25, Loss: 0.2664
Epoch 18/25, Loss: 0.2014
Epoch 19/25, Loss: 0.2029
Epoch 20/25, Loss: 0.1733
Epoch 21/25, Loss: 0.1898
Epoch 22/25, Loss: 0.1613
Epoch 23/25, Loss: 0.2344
Epoch 24/25, Loss: 0.1334
Epoch 25/25, Loss: 0.1775
Training complete.
Accuracy: 93.97%
Model saved to 'resnet50_apple_leaf_disease.pth'

10. Load the model :
Below steps help us save the model state dictionary and infer from the model :

i. Load the model :

import torch
import torch.nn as nn

num_classes = <num_classes> #Adjust based on the dataset
model = ResNet50(num_classes=num_classes)

ii. Load the saved state dictionary :
Use the torch.load function to load the saved state dictionary into your model.

# Load the saved model state dictionary
model.load_state_dict(torch.load('resnet50_apple_leaf_disease.pth'))
model.eval()  # Set the model to evaluation mode
print("Model loaded and set to evaluation mode.")

iii. New Input data :
The input data can be preprocessed the same way as the trained data.

transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load an example image
img_path = 'path/to/image.jpg'
image = Image.open(img_path)
image = transform(image)

iv. Perform Inference :
Finally, use the model to make predictions on the input data.

with torch.no_grad():  # Disable gradient calculation
    outputs = model(image) #call model with the input data
    _, predicted = torch.max(outputs.data, 1) #store the max class probability

# Output the prediction
print(f'Predicted class index: {predicted.item()}')

This approach allows us to efficiently use the trained model for inference on new data.

Conclusion

Among the ResNet architectures, ResNet-50 stands out due to its balance between depth and computational efficiency. With 50 layers, it provides a robust framework for deep learning tasks without excessive computational cost.

The use of residual learning allows ResNet50 to train effectively even at great depths, making it a robust choice for a variety of image recognition tasks. PyTorch’s flexibility and ease of use enables a straightforward implementation of ResNet-50, making it accessible for both experimentation and production.

The model implementated above provides a guide to use it in custom applications and can be directly applied to other advanced architectures and complex problems in the field of computer vision and beyond.

Designing ResNet50 in PyTorch

PyTorch Deep Learning

Table of contents