Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this article, we have listed and explained Different Code Optimizations in Compiler Design such as Peephole optimization, loop unrolling, Loop-invariant code motion and much more.
Table of Contents
- Introduction
- Peephole optimization
- Reduntant load/store elimination
- Constant folding
- Strength reduction
- Null sequences
- Combining operators
- Loop unrolling
- Lazy evaluation
- More optimizations
- Register allocation
- Loop-invariant code motion
- Bounds-checking elimination
- Tail call optimization
- Overview
Introduction
In this article we will cover the basis of optimization in compiler design. This is the concept of program transformation, to make it consume fewer resources such as CPU and memory. This will result in more efficient machine code and therefore a better performing program.
Generally, we can give our compiler optimization the following rules:
- The optimization must not in any way change the function or meaning of the program
- The optimization should increase the performance of the program
- The compilation time must remain reasonable
- The optimization must not delay compilation
Since our code optimization reduces code readability, it is often conducted at the end of development. Optimization can help:
- Reduce space consumed
- Increase complilation speed
- Promote re-usability
Code optimization can be generally defined into two types, Machine Dependant Optimization, which is optimization that is done once the target code has been generated/when the code is transformed for the target architecture. This means that our optimization will include absolute memory references rather than relative, and also CPU registers. The aim of this type of optimization is to take advantage of the internal memory hierarchy. The second type of optimization is Machine Independant Optimization. This is optimization which aims to improve the intermediate code in an attempt to generate a more efficient target code. Therefore there are no CPU registers or absolute memory references involved since we are transforming the intermediate code.
Peephole Optimization
One way we can begin to optimize our code is by using peephole technique. This is the idea of performing optimizations on small parts of the code, replacing sections with shorter and faster code whilst still keeping the inputs consistent. This is a machine dependant optimization.
There are 5 ways we can implement peephole optimization, they are as follows:
- Reduntant load/store elimination
- Constant folding
- Strength reduction
- Null sequences
- Combining operators
Reduntant load/store elimination
Eliminating redundancy in code
Input code:
x = y+5
i = x
k = i
j = k*3
Optimized code:
x = y+5
i = x
j = x*3
Constant folding
Simplification of constants
Input code:
x = 3*3
Optimized code:
x = 9
Strength reduction
Operators with high execution time replaced with ones that have low execution time
Input code:
i = j*2
Optimized code:
i = j+j
Null sequences
Unnecessary operators are removed
Input code:
i = 13+2
k = i+10
Optimized code:
k = 25
Combining operators
Multiple operations can be combined to a shorter equivilant expression
Input code:
i = j*(10+3)
Optimized code:
i = j*13
Peephole optimization is often carried out with pattern matching algorithms (such as ones found in run length encoding) in order to identify where parts in code have been repeated or are unnecessary.
Loop Unrolling
We can also begin optimizing through loop unrolling. This is the idea that we reduce the number of iterations within our program to optimize the execution time. The reason it increases the speed of a program is that it removes the need for loop control or loop test instructions to be generated.
#include<iostream>
using namespace std;
int main(void) {
for (int i=0; i<5; i++)
printf("OPENGENUS\n");
return 0;
}
Here we can see this program uses a for loop to print OPENGENUS 5 times. Using loop unrolling, we can optimize this by removing the loop.
#include<iostream>
using namespace std;
int main(void) {
printf("OPENGENUS\n");
printf("OPENGENUS\n");
printf("OPENGENUS\n");
printf("OPENGENUS\n");
printf("OPENGENUS\n");
return 0;
}
Our version without loops is more efficient since we do not have to 1, check the value of i or 2, increment i each iteration. This is effective for loops which are small in size or that have a fixed number of iterations.
Benefits to this optimization is that program efficiency increases, loop overhead is reduced and that if statements can be executed in parallel since they are not dependant on each other if they are not within a loop.
Disadvantages are that the code increases in size which causes readability issues, register use may increase since a single iteration may require temporary variables, the only effective use for this type of optimization is for small or simple pieces of code since branching unrolled loops are extremely inefficient.
Lazy Evaluation
Lazy evaluation is an evaluation method which means that an expression isn't evaluated until it is used first. Languages such as C++ use strict evaluation, meaning that an expression is evaluated as soon as it is declared, however, lazy evaluation is used widely in functional programming languages such as Haskell. It means that programs can be optimized since if for example we do not use a particular value that has been defined, it will not be calculated unless it is used, saving on CPU resources. This is especially important when working with big data since it is unpredictable and computation resources need to be optimized in order to process it effectively.
Strict evaluation:
list = [2+1, 3*2, 1/0, 5-4]
print length(list)
This would output an error under strict evaluation since we cannot divide by 0, and the list would be evaluated as soon as it's defined.
Lazy evaluation:
list = [2+1, 3*2, 1/0, 5-4]
print length(list)
The length function would run as intended during lazy evaluation since to return the length of the list, the actual values in the list are not required and therefore are not evaluated.
Advantages to this technique are that computations are optimized, circular dependancies can be resolved, allows use of infinite data structres (useful for big data), modularity of code.
Disadvantages however include debugging, since the developer has no control over the order in which the program is executed in, memory usage can be increased since all instructions need to be stored.
More optimizations
Compiler optimization can be carried out in many other ways:
- Register allocation
- Loop-invariant code motion
- Bounds-checking elimination
- Tail call optimization
Register allocation
We can optimize code by allocating frequently used variable in registers so that they can be accessed quickly. We can decide which variables to store in registers by using an interference graph. Each variable is a vertex, when two variables are used at the same time, they have an edge between them.
Loop-invariant code motion
We should aim to remove variables inside loops since if its value is the same each iteration, performance will decline as we are carrying out unnecessary calcultations.
Input pseudocode:
x = 100
WHILE x > 0
y = i + j
IF x MOD y == 0
PRINT x
ENDIF
ENDWHILE
Optimized pseudocode:
x = 100
y = i + j
WHILE x > 0
IF x MOD y == 0
PRINT x
ENDIF
ENDWHILE
Bounds-checking elimination
In languages such as Java, array accesses are bounds-checked. In code relating to data processing, this has a significant impact on performance. We can remove the bounds-checking (for example an index must be within certain bounds) on simple loops to improve efficiency.
Tail call optimization
This optimization method is particularly used commonly with functional languages, however it can also be applied to non-functional.
Since a call of a function uses space in the call stack and produces overhead with its parameters, recursive functions should be converted to iterative where they can be in order to reduce memory used and increase efficiency.
def factorial(n):
if n == 1:
return 1
else:
return n * factorial(n-1)
Above is a recursive function to calculate the factorial of a number, as we know recursion has to "unwind", meaning that for a large number n, stack overflow could occur since each call has to wait for the previous one to complete before it is removed from the stack, taking up huge amounts of space.
n = 5
factorial = 1
for i in range(1, n+1):
factorial *= i
print(factorial)
Here is the same factorial calculation however this time using iteration, it is more efficient than our recursive function since each iteration step is not stored once it has completed, using a for loop over while is also preferable since there is no risk of infinite calls (if a while loop's condition is never met).
Overview
In this article at OpenGenus, we have introduced the concept of optimization of code in compiler design, and numerous ways to do so. We have covered how various methods work, and where they are best suited. After reading, you should have a good overview of the topic and why it is important in compilation.