Challenges of Quantization in Machine Learning (ML)

Quantization is the area of Machine Learning with the most intense research but there are significant challenges in Quantization that is stopping wide adoption. We have list the challenges and drawbacks of Quantization that is holding it back. The challenges/ drawbacks of Quantization in Machine Learning models are as follows:

Significant Accuracy Loss in some models (like BERT)
Quantized weights makes models hard to converge
Back propagation becomes infeasible
Difficult to maintain global structure of weights
Gradient mismatch
Violates stochastic gradient descent algorithm
Consider both sign and magnitude of gradients
Difficult to quantize some models like BERT
Optimal Technique is different for each model

We have explained each of the challenges in depth with examples so that you can follow along and understand the Research that goes into efficient Quantization:

Significant Accuracy Loss in some models (like BERT)

In most models if the correct Quantization techniques are used, then the accuracy loss is within 1% which is acceptable by Industry standards. For some specific models like BERT, Quantization brings in a significant drop in accuracy (> 5%) and hence, is a major challenge for specific models.

Quantized weights makes models hard to converge

When the weights are quantized, the models find it hard to converge during the training process. For this, learning rate should be much lower than expected to ensure good performance.

It is important to figure out how to ensure the stability of the model during the training process.

Back propagation becomes infeasible

When weights are quantized, back propagation becomes difficult because gradient cannot back propagate through discrete values. There are solutions which suggest to use approximation methods to estimate the gradients of the loss function.

The success of the approximation methods is directly linked with the success of Quantization.

Difficult to maintain global structure of weights

As weights are quantized locally, it is a challenge to maintain the global structure across all weights as most weight values are very small and can go undetected during the quantization process if the range is not fixed properly.

Gradient mismatch

When activations are quantized, it leads to a problem known as Gradient mismatch which was identified in 2016 by Lin and Talathi. This means that there is a mismatch between Quantized activation and the computed backward gradient.

To fix this problem, one needs to tune the gradient descent algorithm.

Violates stochastic gradient descent algorithm

When we quantize the gradients, stochastic gradient descent algorithm may not converge if we use the common algorithm. The conditions of stochastic gradient descent algorithm are not satisfied in this case.
More complex techniques are needed to ensure the gradient descent converges.

Consider both sign and magnitude of gradients

When weights are computed, the sign and magnitude of the activations are important during quantization as activations also get quantized. Taking both the sign and magnitude into consideration during computation is a challenge which is not address by the naive algorithms.

Difficult to quantize some models like BERT

For example, Quantization Aware Training is a common yet complex technique which improves Quantization performance and accuracy for almost all models. Still there are 3 models for which accuracy is impacted significantly. These models are ResNeXt101, Mask RCNN and GNMT which poses a significant challenge.

Optimal Technique is different for each model

There are many techniques for different steps of Quantization like Range mapping, Calibration and much more. The choice of these techniques plays a major role in maintaining the accuracy of Quantized models.

For example, for weight quantization, the most common choice is Max technique of Calibration + Per channel technique of Quantization for most CNN models.

With this, you have a good idea of the challenges of Quantization. Intense research is being done in this domain of Machine Learning (ML) and most of the challenges have proposed solutions (advanced) which performs well in real models.