Get this book > Problems on Array: For Interviews and Competitive Programming
Reading time: 20 minutes
Optimizations are required to run training and inference of Neural Networks faster on a particular hardware infrastructure. It is important to maintain the accuracy of Neural Networks while applying various optimizations.
The various types of Neural Network optimizations available are as follows:
Weight pruning
 Elementwise pruning using:
 magnitude thresholding
 sensitivity thresholding
 target sparsity level
 activation statistics
Structured pruning
Pruning is a technique to reduce the number of activation values involved in a Neural Network which in turn which reduce the number of computations required.

Convolution:
 2D (kernelwise)
 3D (filterwise)
 4D (layerwise)
 channelwise structured pruning

Fullyconnected:
 columnwise structured pruning
 rowwise structured pruning

Structure groups

Structureranking using:
 weights
 activations criteria like:
 Lpnorm
 APoZ
 gradients
 random

Block pruning
Control
 Soft and hard pruning
 Dual weight copies
 Model thinning to permanently remove pruned neurons and connections.
Schedule
 Compression scheduling: Flexible scheduling of pruning, regularization, and learning rate decay
 Oneshot and iterative pruning
 Automatic gradual schedule for pruning individual connections and complete structures
 Elementwise and filterwise pruning sensitivity analysis
Regularization
 L1norm elementwise regularization
 Group Lasso or group variance regularization
Quantization
 Posttraining quantization
 quantizationaware training
Knowledge distillation
 Training with knowledge distillation along with:
 pruning
 regularization
 quantization methods.
Conditional computation
 Early Exit