Deep Learning INT4 Quantization (with code demonstration) INT4 quantization is a technique used to optimize deep learning models by reducing their size and computational costs. It achieves this by using 4-bit integers instead of 32-bit floating-point numbers.