Winograd's Convolution Theorem [Explained]

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Convolution itself is a very hefty process, in most cases there is a lot of repetitive informations from one step of the window to the next.

This significant overlap gives us the intuition that there maybe more efficient methods, that could omit extra calculations and give us a more efficient convolution method.

This theorem which gives us this efficient method has a more mathematical descent and did not originate in Convolutional netwroks. We saw our first CNN is 1994, while we had this theorem since the 80s.

As discussed, winograd is important due to the significant overlap in general convolution, which increases the complexity, it is necessary that the filter has some overlap. This method will not work with a filter size of 1x1 or a filter with size nxn and stride n.

The primary reason this is preferred is

Multiplication is the bottleneck of most processor based systems.
We use our weights to build another matrix which helps reduce multiplications through additions and subtractions.

At times, additions may outweigh the multipplications and this needs to be taken care of while watching out for using this theorem.

Winograd Convolution proved a lower bound on the number of multiplications required for convolution, and used Chinese Remainder Theorem to construct optimal algorithm to achieve minimum number of multiplies.

Direct algorithm for convolution is O(m^2).

The family of Winograd's algorithms is based on

Transforming tiles of the input and kernel into modulo polynomial domain (the Winograd domain)
Convolution becomes the elementwise multiplication (Hadamard product) with complexit O(m).
The result is tranformed back to its original domain.

Its disadvantage lies at some of the intermediate results that are 2.25 times larger than it once was, hence accuracy problems might arise.

But if you get this right,
Let 75% of the operations have 3x3 filters of stride 1, we can be observing 1.7x improvement in throughput and power efficiency.

Now let us get into the meaty part, the code itself!
Below is the code for Winograd Convolution.

import numpy as np
import Toom_Cook_Matrices as TC
import sys

def input_transform(BT, B, inputt):
    mul1 = BT.dot(inputt)
    result = mul1.dot(B) 
    return result   

def kernel_transform(G, GT, kernel):
    mul1 = G.dot(kernel)
    result = mul1.dot(GT)
    return result

def Hadamard_product(transformed_input, transformed_kernel):
    result = np.multiply(transformed_input, transformed_kernel)
    return result

def output_transform(AT, A, Hadamard_product):
    mul1 = AT.dot(Hadamard_product)
    result = mul1.dot(A)
    return result

def Winograd_convolution(points, H, X, precision):
    n_kernel = H.shape[0]
    n_input = X.shape[0]
    n_output = n_input - n_kernel + 1
    G, GT, AT, A, BT, B = TC.Toom_Cook_Matrices(points, n_kernel, n_output, precision)
    kernel_transformed = kernel_transform(G, GT, H)
    input_transformed = input_transform(BT, B, X)
    Hadamard = Hadamard_product(kernel_transformed, input_transformed)
    result = output_transform(AT, A, Hadamard)
    return result

While explaining the mathematical implementation, of this is beyond the scope of this article, due to its prerequisites. If you are really interested to know, looking through each line in the code above and understanding the actual functioning is recommended.

We can definitely look for a minimal Winograd convolution algorithms in a concise mathematical manner.

In the following piece of code we will be computing transform for F(2,3) or F(2x2,3x3) using polynomial interpolation points (0,1,-1).

>>> import wincnn
>>> wincnn.showCookToomFilter((0,1,-1), 2, 3)
AT = 
⎡1  1  1   0⎤
⎢           ⎥
⎣0  1  -1  1⎦

G = 
⎡ 1    0     0 ⎤
⎢              ⎥
⎢1/2  1/2   1/2⎥
⎢              ⎥
⎢1/2  -1/2  1/2⎥
⎢              ⎥
⎣ 0    0     1 ⎦

BT = 
⎡1  0   -1  0⎤
⎢            ⎥
⎢0  1   1   0⎥
⎢            ⎥
⎢0  -1  1   0⎥
⎢            ⎥
⎣0  -1  0   1⎦

AT*((G*g)(BT*d)) =
⎡d[0]⋅g[0] + d[1]⋅g[1] + d[2]⋅g[2]⎤
⎢                                 ⎥
⎣d[1]⋅g[0] + d[2]⋅g[1] + d[3]⋅g[2]⎦

The final matrix is the 1D convolution F(2,3) computed using the transforms AT, G, and BT, on four element signal d[0..3] and three element filter g[0..2], and validates the transforms. Since this happens to be a symbolic computation, the result must be exact.

This article at OpenGenus is the first step. There is a lot of study on how to make an effective Winograd Covolution. You are more than welcome to give your inputs, and you never know you come across something revolutionary!

Winograd's Convolution Theorem [Explained]

Deep Learning Machine Learning (ML)

Journey through Backpropagation algorithm

Next Greater Element using Stack