Affine Quantization

Affine Quantization is an asymmetric range mapping technique which is used to map an input of range of a given range (say [A1, A2]) to a quantized range of size B bits. One example is to convert an FP32 data of a given range to INT8 data.

Affine Quantization is one of the fundamental techniques that is used to quantize the input and weights in a Machine Learning model and is the core concept in Quantization in Machine Learning.

We will go through some basics of Range mapping before diving into Affine Quantization.

Range mapping

INT8 can store values from -128 to 127. In general, an B bit Integer can have the range as -(2^B) to (2^B-1).

In Range mapping, we have to convert a data of range [A1, A2] to the range of the B bit Integer (INT8 in our case).
Hence, the problem is to map all elements in the range [A1, A2] to the range [-(2^B), (2^B-1)]. Elements outside the range of [A1, A2] will be clipped to the nearest bound.

There are two main types of Range mapping in Quantization:

  • Affine quantization
  • Scale quantization

We will cover Affine Quantization in this OPENGENUS article.

For quantization, there are two main types of mapping equation that are used by the above techniques:

F(x) = s.x + z

where, s, x and z are real numbers.

The special case of the equation is:

F(x) = s.x

s is the scale factor and z is the zero point.

Affine Quantization

Affine Quantization uses the following equation for Quantization:

F(x) = s.x + z

where

  • F is the quantization function
  • x is the input
  • F(x) is the quantized output
  • s is the scale factor
  • z is the zero point

The special case of the above equation is not used in Affine Quantization.

In Affine Quantization, the parameters s and z are as follows:

s = (2^B + 1)/(A1-A2)

z = -(ROUND(A2 * s)) - 2^(B-1)

For INT8, s and z are as follows:

s = (255)/(A1-A2)

z = -(ROUND(A2 * s)) - 128

Once you convert all the input data using the above equation, we will get a quantized data. In this data, some values may be out of range. To bring it into range, we need another operation "Clip" to map all data outside the range to come within the range.

The Clip operation is as follows:

clip(x, l, u) = x   ... if x is within [l, u]

clip(x, l, u) = l   ... if x < l

clip(x, l, u) = u   ... if x > u

In the above equation, l is the lower limit in the quantization range while u is the upper limit in the quantization range.

So, the overall equation for Quantization in Affine Quantization is:

x_quantize = quantize(x, b, s, z) 

    = clip(round(s * x + z), 
           −2^(B−1), 
           2^(B−1) − 1)

For dequantization, the equation in Affine Quantization is:

x_dequantize = dequantize(x_quantize, s, z) 
             = (x_quantize − z) / s

NOTE: Affine Quantization is asymmetric as the quantization range is not symmetric (consider, for INT8, which is -128 to 127 whereas a symmetric range will be -127 to 127).

Hence, the key points of Affine Quantization are:

  • For INT8, the quantization range is -128 to 127.
  • The quantization equation is:
x_quantize = quantize(x, b, s, z) 

    = clip(round(s * x + z), 
           −2^(B−1), 
           2^(B−1) − 1)
  • The dequantize equation is:
x_dequantize = dequantize(x_quantize, s, z) 
             = (x_quantize − z) / s
  • The scale and zero points are:
s = (2^B + 1)/(A1-A2)
z = -(ROUND(A2 * s)) - 2^(B-1)

With this OPENGENUS article, you must have the complete idea of Affine Quantization which is one of the most common range mapping techniques in Quantization (ML).