Scale Quantization

Scale Quantization is a symmetric range mapping technique (with no zero point) which is used to map an input of range of a given range (say [A1, A2]) to a quantized range of size B bits. One example is to convert an FP32 data of a given range to INT8 data.

Scale Quantization is one of the fundamental techniques that is used to quantize the input and weights in a Machine Learning model and is the core concept in Quantization in Machine Learning.

We will go through some basics of Range mapping before diving into Scale Quantization. Scale Quantization is an alternative to Affine Quantization.

Range mapping

INT8 can store values from -128 to 127. In general, an B bit Integer can have the range as -(2^B) to (2^B-1).

In Range mapping, we have to convert a data of range [A1, A2] to the range of the B bit Integer (INT8 in our case).
Hence, the problem is to map all elements in the range [A1, A2] to the range [-(2^B), (2^B-1)]. Elements outside the range of [A1, A2] will be clipped to the nearest bound.

There are two main types of Range mapping in Quantization:

Affine quantization
Scale quantization

For quantization, there are two main types of mapping equation that are used by the above techniques:

F(x) = s.x + z

where, s, x and z are real numbers.

The special case of the equation is:

F(x) = s.x

s is the scale factor and z is the zero point.

Scale Quantization

The difference in Scale Quantization (in comparison to Affine Quantization) is that in this case, the zero point (z) is set to 0 and does not play a role in the equations. We use the scale factor (s) in the calculations of Scale Quantization.

We use the following equation:

F(x) = s.x

There are many variants of Scale Quantization and the simpliest is Symmetric Quantization. In this, the resultant range is symmetric. For INT8, the range will be [-127, 127]. Note that we are not considering -128 in the calculations.

Hence, in this, we will quantize a data from range [-A1, A1] to [-(2^(B-1), 2^(B-1)]. The equations for Quantization will be:

s = (2^(B - 1) − 1) / A1

Note, s is the scale factor.

The Clip operation is as follows:

clip(x, l, u) = x   ... if x is within [l, u]

clip(x, l, u) = l   ... if x < l

clip(x, l, u) = u   ... if x > u

In the above equation, l is the lower limit in the quantization range while u is the upper limit in the quantization range.

The overall equation is:

x_quantize = quantize(x, B, s) 
           = clip(round(s * x), 
                  −2^(B - 1) + 1, 
                  2^(B - 1) − 1)

The equation for dequantization will be:

x_dequantize = dequantize(x_quantize, s) 
      = x_quantize / s

The key points of Scale Quantization are:

It is a symmetric range mapping technique
For INT8, the quantization range is -127 to 127
There is no zero point (or 0 by default) in Scale Quantization
The quantization equation is:

x_quantize = quantize(x, B, s) 
           = clip(round(s * x), 
                  −2^(B - 1) + 1, 
                  2^(B - 1) − 1)

The dequantize equation is:

x_dequantize = dequantize(x_quantize, s) 
      = x_quantize / s

The scale value is:

s = (2^(B - 1) − 1) / A1

With this OPENGENUS article, you must have the complete idea of Scale Quantization in Machine Learning.